home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
SPACE 1
/
SPACE - Library 1 - Volume 1.iso
/
program
/
386
/
arpbook7
/
chap_7.doc
Wrap
Text File
|
1989-03-30
|
125KB
|
2,996 lines
Atari ST Machine Specific Programming In Assembly
Chapter 7: An Adaptive Algorithm
Algorithmic Intelligence
By saying to you now that I consider every item of
software to be intelligent to at least some degree, I
restrict the range of my future comments concerning any
comparisons between the levels of intelligence of items of
software to the extent that I will not be able to refer to
one item as intelligent and another as being devoid of
intelligence. My comments would not be as restricted if
there were a system of intelligent quotients for software,
because then I would be able to assign an IQ to each item
under discussion in order to specify its relative level of
intelligence.
I am not aware of the existence of such a system and I
have no desire to develop one myself, at least not at this
time. Therefore, in order to differentiate between levels
of software intelligence, I must use descriptive phrases
such as smarter than or more intelligent than. I place
these limits on the terminology I permit myself to use in
this chapter because I want to avoid extensive,
nonproductive philosophical discussions of relative levels
of intelligence.
Most of the algorithms that I have discussed so far
have been designed to perform their functions in an
unsophisticated fashion, without much regard for external
stimuli. Notable exceptions are the custom traps which
adjust their output based on the number of digits or the
number of leading zeroes in a passed parameter and trap #8,
which executes a wait for keypress algorithm only if the
invoking process was not spawned by another. To the degree
that response to external stimuli is a measure of
intelligence, those custom traps which adapt their output
according to input are more intelligent than those which
simply alter a memory location, such as does trap #0, or
return a value, such as does trap #3.
Much as I have declared custom traps which alter their
output in response to their input to be the more intelligent
in the paragraph above, I shall simply declare that an
algorithm which can alter itself in response to external
stimuli to be more intelligent than one which cannot do so.
And, in order to reduce the length of descriptive phrases, I
shall refer to such algorithms as adaptive algorithms. In
fact, I shall simply declare that the term adaptive applies
to any algorithm which can alter itself in response to any
stimuli, whether it be externally or internally generated.
If you have read the suggested material in Christopher
Evans' The Micro Millennium, I believe that you will grasp
my line of reasoning and deduce that I am laying the
foundation for a discussion of an item of software to which
Mr. Evans' six major constituents of intelligence may be
applied. I am not concerned about adhering religiously to
his definitions, nor do I wish to confirm or refute his
conclusions. My only concern is that you understand the
discussions in this chapter, and I feel that I could not
have found a better reference than his for your perusal.
I am very aware of the hundreds, perhaps even
thousands, of books which have been jammed on bookshelves in
recent years simply because a number of people with nothing
to say realized that money could be had if it was said in a
book bearing the words Artificial Intelligence. And if one
of these books does actually contain useful information, I
shall probably never know of it because I have vowed not to
touch a book bearing those words on its front cover. In
fact, I would consider myself honored if you would now ink
over those words in the sentence above and pretend that I
never wrote them.
I have recommended Mr. Evans' book because it is not at
all like those others that are filled with buzz word
gibberish copied from articles or other books filled with
identical gibberish. I have enjoyed the Evans book
immensely, and I have just finished reading it for the third
time, so as to have it freshly in mind as I write this
chapter. If you disagree with my choice of references, I
can live with that, but I would be negligent if I did not
suggest to you the material which has had the greatest
influence on the material which I present.
Onanism Corruptus: The Sin of Self-Modification
At this time, there is a notion that I would like you
to store someplace in a corner of your mind, and, from time
to time, as I spend my energy in conveying to you the
concept and the reality of a software tool that I believe is
as powerful as the concept and reality of the computer
itself, I would like you to recall that notion and consider
the limiting effect that would be placed on software utility
were it to be universally enforced. This notion is
expressed most succinctly on page 92 of 680x0 Programming by
Example, written by Stan Kelly-Bootle, under the heading
Logic Behind Rule C. The precise sentence which I would
like you to remember, if for no other reason than that it
alone justifies the existence of this entire chapter, is
that which follows:
"Motorola discourages self-modifying code, so they
make it difficult, but not impossible, for a program
to write over itself."
I have chosen to use Mr. Kelly-Bootle's remarks, not because
I consider his to be any more naive than those of other
authors concerning the same subject. Rather, I have chosen
his book to be a very reputable source; to what end would I
choose a non-reputable source?
I cannot dispute the sincerity of Mr. Kelly-Bootle's
remarks, nor can I verify or refute his claim of Motorola's
concerns and intentions. But I most certainly can say that
the notion of a microprocessor manufacturer deliberately
hindering the processing power of a microprocessor leaves me
breathless. OK, that's a little strong. Actually, the
vehemency expressed in that sentence is somewhat pretended,
and is merely a ploy which permits me to ventilate my
opinion about the notion while indulging in a little private
sick humor.
But there is nothing sick about the concept of self-
modifying code, and there is nothing humorous about the idea
of making it difficult. Because it is this tool alone which
permits the intelligence level of software to rise above
mediocrity. And say what you will about the power of the
human brain, in some areas in is overwhelmingly bested by
even mediocre software; and without the ability to self-
modify its programs, the human brain would be about as
intelligent as a computer program with the same limitation.
Unlike the sin of self-gratification, the sin of self-
modification will not cause blindness; however, it can lead
to coitus interruptus. This condition becomes evident when
a row of bombs appear across the video screen. It means, of
course, that your self-modifying program has screwed things
up enough to cause the generation of a software interrupt.
I think that it is safe for me to say that the more
intelligence you insert into your algorithms, the more
likely they are to acquire the attribute of
unpredictability. So, if you permit them to engage in the
perverse activity of self-modification, you had better be
prepared to protect yourself from the consequences if they
decide to run amuck. Of course, your first defensive action
involves quality control. You can maintain rigid control by
thoroughly understanding each instruction used in an
adaptive algorithm and by being able to precisely predict
the effects upon the entire computer system as each is
executed.
Scope of Chapter 7 and Relevant Considerations
The algorithm that is the focus of this chapter is a
custom trap that reconstructs itself in a disciplined manner
to form an execution loop about an algorithm under test
(AUT). The address of the AUT is passed to the custom trap
by an invoking program. The trap copies the AUT to a area
of memory that is reserved within the boundaries of the
trap, constructs the execution loop, implements the loop and
generates appropriate performance data concerning the AUT.
According to the definition previous established, the
algorithm to be developed in this chapter is an adaptive
algorithm.
As far as I know, there are two ways in which an
adaptive algorithm can modify itself: it can generate
executable code within its execution space or it can copy
executable code from a designated location into that space.
Of course, there are no restrictions concerning combinations
of the two methods of adaptation. In this discussion, I do
not consider the simple alteration of declared variables to
be adaptive, although I can understand why some would label
this as self-modifying. Neither do I consider the
alteration of an algorithm's code by an outside agent to be
adaptive. When I say that an algorithm is adaptive, I mean
that its structure permits it to modify its structure.
While explaining the design and operation of the
adaptive algorithm, I will mention all of the considerations
that apply to the programming environment that I have
adopted; that is the AssemPro environment. These may not
apply when other environments are used. Furthermore, it
should be understood that I will mention the considerations
of which I am aware; I can't promise to be infallible.
As you know, AssemPro provides more than one assembly
option. I have shown you some of the effects on produced
code by each. As you shall see, the mode used to assemble
the adaptive algorithm partly determines its structure. In
addition, there is the mode of assembly used to generate the
AUT's code to consider. The results generated by the
adaptive algorithm are determined to some extent by the
assembly mode used on both the adaptive algorithm and the
algorithm under test.
Initial Structure of the Adaptive Algorithm
Figure 7.1A is a representation of the structure of the
adaptive algorithm as it exists in memory before it has ever
been invoked. The structure is defined by four distinct
sections: an initialization section; a fixed section which
installs the AUT, replicates the fourth section and
initiates execution of the loop; a block of memory in which
the AUT and the replicated fourth section are to reside; and
the section which contains the loop branch instruction, code
to generate the performance report and a data section.
Figure 7.1A. Structure of the adaptive algorithm prior to
initial invocation.
Figure 7.1B illustrates the structure of the program
that contains the algorithm to be tested. The hands in this
figure, and those in figure 7.1C, are there to represent the
movement of the algorithm under test from the invoking
program to the adaptive algorithm. These pictorials are
actually perverse demonstrations of what does not happen. I
wish to graphically impress you with the fact that something
much more powerful than mere movement of data is involved in
this and in similar data transactions. The code which forms
the algorithm under test is not moved; it is copied. I feel
that the implications of this fact is deluded by the
instruction used to perform the transaction. The situation
would be expressed much more clearly if the mnemonic for the
instruction were copy instead of move. Nevertheless, I too
use the word move when I mean copy because that is the
terminology I was taught. As long as one remembers that
this transaction we call move is non-destructive, and that
the copied data remains intact and can be cloned as often as
desired, either word can be used to indicated the
transaction which actually takes place.
Figure 7.1B. Structure of the adaptive algorithm invoking
program.
The thoughts concerning the copying of data, versus the
movement of data, expressed about figure 7.1B also apply to
figure 7.1C. There you see the AUT being moved into the
space reserved within the adaptive algorithm. The section
of reserved space under the block pictorial of the AUT
implies that the AUT is small enough so that it never fills
the reserved space. Of course, the length of the reserved
space is programmable within the adaptive algorithm, and, in
fact, the length of the reserved space could be set
dynamically at run-time.
Figure 7.1C. Algorithm under test being installed within the
reserved space of the adaptive algorithm.
After the AUT has been copied into the reserved space,
the remaining instructions which form the adaptive algorithm
and its data must be moved up into the reserved space,
adjacent to the AUT. The reason for this will be clearer
when you view the source and the disassembly listing, but,
briefly, this is done so that the loop branching instruction
and attendant data is placed appropriately within the newly
defined adaptive algorithm structure. Figure 7.1D
illustrates the appearance of a portion of the new
structure. An important section has been omitted from this
drawing.
Figure 7.1D. Closing the gap between the algorithm under
test and the fourth section to form the executable loop and
attendant code.
Figure 7.1E is a pictorial of the adaptive algorithm
structure after its initial invocation. There you see the
section that was omitted from figure 7.1D. Remember that
the Movable Loop Section and Data Section is not moved; it
is copied; therefore, the original section of code remains
in place, until it is removed deliberately or until the
adaptive algorithm is removed from memory. There is no
reason that instructions to remove that section could not be
included in the adaptive algorithm. I did not include such
instructions in this algorithm because the movable sections
are replicated during each invocation of the trap;
therefore, they must remain intact. The structure shown in
figure 7.1E is valid for all future invocations of the
adaptive algorithm. The initially installed AUT and
attendant code is simply overwritten by future invocations.
Figure 7.1E. Structure of the adaptive algorithm after the
initial invocation and during all subsequent invocations.
I have designed the adaptive algorithm as a custom trap
so that it need simply be invoked by a program containing an
algorithm for which the execution speed and requisite memory
is desired. When I began to design this trap, I realized
that I could not anticipate all possible ramifications of
AUT's assembled in the various assembly modes that I have
discussed, nor could I apriorily judge the most suitable
mode of assembly for the adaptive algorithm. Therefore, I
prepared two programs: one designed for PC-relative
assembly, the other for Relocatable assembly. Program 34 is
a listing of the adaptive algorithm designed for PC-relative
assembly. The program installs the adaptive algorithm as
custom trap #9. A routine with the same trap number was
used during the development of the SPEEDTST utility, but its
usefulness in that function has expired, so it would be best
not to have the older trap #9 routine in a position that
might cause confusion about which routine is installed.
Program 34. The adaptive algorithm designed as a custom
trap. This particular program has been designed for
assembly in the PC-relative mode.
; Program Name: TRAP_9P.S
; Version 1.009
; Assembly Instructions:
; Assemble in PC-relative mode and save with a PRG extension. TRAP_9R.S
; is a version of this program that must be assembled in Relocatable mode.
; I have prepared two versions because I can't predict the precise requirements
; of all possible algorithms which might invoke the custom trap that is
; installed by this program.
; Program Function:
; This is a LSR program that establishes a user defined trap #9. It may
; be executed from the desktop, but you may prefer to copy it to the AUTO
; folder of your boot partition or floppy disk so that it will execute
; automatically during boot.
program_start: ; Calculate program size and retain result.
lea program_end, a3 ; Fetch program end address.
suba.l 4(a7), a3 ; Subtract basepage address.
enter_supervisor_mode:
move.l #0, -(sp) ; The zero turns on supervisor mode.
move.w #$20, -(sp) ; Function = super = GEMDOS $20.
trap #1 ; Supervisor stack pointer (SSP) returned in D0.
addq.l #6, sp
movea.l d0, a5 ; Save SSP in scratch register.
install_trap_9_routine: ; Note: pointer = vector = pointer.
lea trap_9_routine, a0 ; Fetch address of trap #9 routine.
move.l a0, $A4 ; Store trap address at pointer address.
enter_user_mode:
pea (a5) ; Restore supervisor stack pointer.
move.w #$20, -(sp) ; Function = super = GEMDOS $20.
trap #1
addq.l #6, sp
relinquish_processor_control: ; Maintain memory residency.
move.w #0, -(sp) ; Exit code.
move.l a3, -(sp) ; Program size.
move.w #$31, -(sp) ; Function = ptermres = GEMDOS $31.
trap #1
trap_9_routine:
; NOTE: You may want to execute a program that invokes this trap from the
; AssemPro debugger and single step through the trap instructions, or
; you many want to set breakpoints and use the "Run program" button to
; execute to a breakpoint. It the exercise of this latter option that
; can cause problems. If you want to set a breakpoint at an instruction
; that occurs before the "reserved space", no precautions need be taken,
; except that you must realize that you could halt at a position after
; the start time has been initialized, and the execution time will be
; accumulating during the halt.
; But, if you want to set a breakpoint at at an instruction that occurs
; after the "reserved space", you should not do so until the instructions
; that perform the bulk move have been executed. If you set a breakpoint
; at an instruction in the program space which follows the "reserved
; space" and then execute the bulk move instructions, the breakpoint you
; have set will cause an illegal command error.
; This custom trap must be invoked by a program which contains one or more
; algorithms for which performance data, in the form of execution time and
; requisite memory, is desired. The trap must be invoked once for each
; algorithm under test (AUT). The trap is intended to be a performance
; report generator so that performance data for specific algorithms can be
; compared.
; If the program invoking the trap is spawned by SPAWN.TTP, the performance
; data will be printed to a file that is created by SPAWN.TTP. The name of
; the file will be the name of the spawned program with a DAT suffix.
; If the program invoking the trap is executed from the desktop, the
; performance data will be printed to the screen.
; The AUT can be assembled in PC-relative or Relocatable mode.
; The custom trap expects the following parameters to be contained in the
; specified registers prior to invocation:
; A0 = The address of a null terminated header to precede the reported
; AUT execution time.
; A1 = The address of a null terminated header to precede the reported
; AUT requisite memory.
; A3 = The starting address of an algorithm containing one or more instructions.
; This algorithm becomes the AUT upon invocation of the trap.
; A4 = The ending address of the AUT.
; A5 = At the initial invocation by a specific program, A5 is expected to
; contain the address of a null terminated heading that is a brief
; description of the data that is to be reported by the trap. During
; all subsequent invocations by that program, A5 must contain the
; address of a null string.
; D7 = A word length decimal value which specifies the number of times the
; the AUT is to be executed in a loop in order to generate the execution
; time data. Register D7 is the only register which cannot be used by
; the AUT. That's because D7 is used as the AUT execution loop counter.
; The custom trap uses the information passed in A3, A4 and D7 to construct
; a loop around the AUT. The loop is executed the number of times specified
; in register D7. The time required to execute the loop is calculated and
; printed along with the AUT's requisite memory.
; How Trap #9 Works:
; The custom trap is an intelligent, self-modifying algorithm. Prior
; to an initial invocation, the trap #9 routine is composed of three sections.
; The first section contains instructions which are permanently located. The
; second section is a 1,024 byte block of reserved space into which the AUT
; and the third section is to be copied. The third section contains the custom
; trap instructions that must be copied so that the first instruction in the
; third section immediately follows the last instruction of the algorithm
; under test. The size that I chose for the reserved space was arbitrary.
; After an invocation, the custom trap calculates the number of bytes in
; the algorithm under test. Then it initializes register D7 for the AUT's
; execution loop. After saving the addresses of the headers that were passed
; as parameters so that the registers in which they were passed can be used
; by trap #9 and/or the algorithm under test, the trap prints the AUT's
; primary heading and the execution time header. Then it copies the AUT into
; the reserved space.
; After the AUT has been copied into the reserved space, the instructions
; and data in the third section of the custom trap are moved into immediate
; proximity of the AUT so that the first instruction of section three follows
; immediately after the last instruction of the algorithm under test.
; The AUT is executed the number of times requested via the value passed
; in D7, then the execution time and requisite memory of the AUT are printed.
; After the custom trap has been invoked once, the reserved space will
; not be completely clear during the remaining portion of the power-up cycle.
; When the trap is subsequently invoked, the instructions that are already in
; the reserved space will simply be overwritten by the instructions of the
; new algorithm under test and those of the third section.
calculate_length_of_AUT:
suba.l a3, a4 ; Subtract start address from end address.
save_heading_addresses:
lea time_header, a2 ; Header to precede reported execution time.
move.l a0, (a2)
lea memory_header, a2 ; Header to precede reported requisite memory.
move.l a1, (a2)
transfer_stack_pointer:
; Upon entrance into this custom trap, A7 contains the address of the system
; stack pointer. Because the system stack may be too short for use by the AUT,
; the invoking program must provide a user stack, and, here, I am going to
; store the address of that stack in A7. Before returning to the invoking
; program, the system stack pointer must be restored so that the return address
; can be found by the processor.
lea _SSP, a0 ; Fetch address of declared variable.
move.l a7, (a0) ; Save system stack pointer.
move USP, a7 ; Transfer to invoking program's stack.
print_heading:
; NOTE: When two or more algorithms are to invoke the trap from a program,
; the heading should be printed only once, at the beginning of the
; report; therefore, the string address in A5 should point to a
; null terminated string during the first invocation of the trap, but
; it should point to a null string on subsequent invocations from the
; same program.
; For that reason, the test for NULL string is included below, so that
; only one attempt to print the heading will be made.
tst.b (a5) ; NULL string test.
beq.s print_time_header ; Branch if null.
movea.l a5, a0 ; Print heading.
bsr print_string
print_time_header:
movea.l time_header, a0 ; Fetch address of time header.
bsr print_string
build_algorithm:
lea reserved_space, a6 ; Fetch address of reserved space.
move.w a4, d3 ; AUT length initializes counter for copy.
lea memory, a0 ; Store AUT's requisite memory.
move.w d3, (a0) ; AUT length stored for report.
subq.w #1, d3 ; Initialize counter for loop execution.
move_loop: ; Move AUT, byte by byte, to the area
move.b (a3)+, (a6)+ ; declared as "reserved_space".
dbra d3, move_loop ; Branch until D3 = -1.
; A6 is now pointing to the location at which the AUT execution loop's DBRA
; instruction must be copied. The DBRA copy must be processed apart from the
; other instructions in the custom trap which must be copied to "close up" the
; gap between the last statement of the algorithm under test and the rest
; of the custom trap's instructions.
lea algorithm_end, a0 ; Calculate number of bytes in this program
lea program_end, a1 ; which must be copied into "reserved_space",
suba.l a0, a1 ; then initialize counter D3 with the number
move.w a1, d3 ; of bytes to copy.
subq.w #4, d3 ; Subtract the DBRA requisite memory.
movea.l a0, a1 ; Calculate amount the DBRA instruction must
suba.l a6, a1 ; be moved and save the new location in scratch
movea.l a6, a5 ; register for displacement correction.
move.w 2(a0), d0 ; Fetch current DBRA displacement value.
move.l (a0)+, (a6)+ ; Move the DBRA instruction.
add.w a1, d0 ; Correct DBRA displacement to the value it
move.w d0, 2(a5) ; should be after the move.
subq.w #1, d3 ; Initialize counter for bulk copy.
_move_loop: ; Move rest of algorithm in a bulk copy.
move.b (a0)+, (a6)+
dbra d3, _move_loop
subq.w #1, d7 ; Initialize D7 for AUT loop execution.
lea start_time, a0 ; Fetch address of declared variable.
suba.l a1, a0 ; Correct address to after-move location.
; NOTE: Since processor is in supervisor mode, there is no reason to use
; trap #3 to fetch time. The 200hz vector can be referenced directly.
move.l $4BA, (a0) ; Fetch and store start time.
algorithm_start: ; new location.
reserved_space: ds.b 1024 ; Space for loop construction = 1024 bytes.
algorithm_end:
dbra d7, algorithm_start
move.l $4BA, d0 ; Fetch end time.
sub.l start_time, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
print_memory_header:
movea.l memory_header, a0 ; Fetch address of memory header.
bsr.s print_string
print_requisite_memory:
move.w memory, d1 ; Fetch AUT's requisite memory.
trap #4 ; Returns address of decimal string in A0.
bsr.s print_string
lea header_1, a0 ; Print requisite memory units label.
bsr.s print_string
; NOTE: I had thought that I might have to restore the user stack pointer
; before returning to the calling program because I can't know to what
; extent the AUT has corrupted the user stack.
; But, as it turns out, since the value in USP is stored in register
; A7 at the beginning of the program, and since the system stack pointer
; is then active and remains so for the duration of the trap invocation,
; only the value in register A7 (which is the system stack pointer, but
; which points to the user stack) is altered; the value in USP remains
; uncorrupted, therefore, nothing needs be restored but the system stack
; pointer (so that it again points to the system stack).
; On subsequent invocations the value in USP, which remains stable, will
; simply be restored in A7, as often as the trap is invoked.
movea.l _SSP, a7 ; Restore system stack pointer so that it
rte ; points to the system stack.
;
; SUBROUTINES
;
print_string:
pea (a0)
move.w #9, -(sp)
trap #1
addq.l #6, sp
rts
data
header_1: dc.b ' bytes',$D,$A,0
bss
align
_SSP: ds.l 1 ; Address of the system stack will be stored here.
start_time: ds.l 1 ; Variable for time at start of AUT processing.
memory: ds.w 1 ; Variable for AUT's requisite memory.
time_header: ds.l 1 ; Points to address of header which is to precede
; the reported AUT processing time.
memory_header: ds.l 1 ; Points to address of header which is to precede
; the reported AUT requisite memory.
program_end: ds.l 0
end
Comprehension of program 34 will be facilitated with
reference to a disassembly listing of the program as it
resides in memory. Figure 7.2 is a portion of that listing.
A section of the reserved space has been deleted in order to
reduce the size of the listing. This listing shows the code
for the trap before any program has invoked it.
Figure 7.2. Disassembly listing of TRAP_9P.PRG as it resides
in memory before an initial invocation.
TRAP_9P.PRG Before First Invocation
01FA36 99CB SUBA.L A3,A4
01FA38 45FA04B4 LEA $1FEEE(PC),A2
01FA3C 2488 MOVE.L A0,(A2)
01FA3E 45FA04B2 LEA $1FEF2(PC),A2
01FA42 2489 MOVE.L A1,(A2)
01FA44 41FA049E LEA $1FEE4(PC),A0
01FA48 208F MOVE.L A7,(A0)
01FA4A 4E6F MOVE.L USP,A7
01FA4C 4A15 TST.B (A5)
01FA4E 6706 BEQ.S $1FA56
01FA50 204D MOVEA.L A5,A0
01FA52 6100047A BSR $1FECE
01FA56 207A0496 MOVEA.L $1FEEE(PC),A0
01FA5A 61000472 BSR $1FECE
01FA5E 4DFA0046 LEA $1FAA6(PC),A6
01FA62 360C MOVE.W A4,D3
01FA64 41FA0486 LEA $1FEEC(PC),A0
01FA68 3083 MOVE.W D3,(A0)
01FA6A 5343 SUBQ.W #1,D3
01FA6C 1CDB MOVE.B (A3)+,(A6)+
01FA6E 51CBFFFC DBRA D3,$1FA6C
01FA72 41FA0432 LEA $1FEA6(PC),A0
01FA76 43FA047E LEA $1FEF6(PC),A1
01FA7A 93C8 SUBA.L A0,A1
01FA7C 3609 MOVE.W A1,D3
01FA7E 5943 SUBQ.W #4,D3
01FA80 2248 MOVEA.L A0,A1
01FA82 93CE SUBA.L A6,A1
01FA84 2A4E MOVEA.L A6,A5
01FA86 30280002 MOVE.W 2(A0),D0
01FA8A 2CD8 MOVE.L (A0)+,(A6)+
01FA8C D049 ADD.W A1,D0
01FA8E 3B400002 MOVE.W D0,2(A5)
01FA92 5343 SUBQ.W #1,D3
01FA94 1CD8 MOVE.B (A0)+,(A6)+
01FA96 51CBFFFC DBRA D3,$1FA94
01FA9A 5347 SUBQ.W #1,D7
01FA9C 41FA044A LEA $1FEE8(PC),A0
01FAA0 91C9 SUBA.L A1,A0
01FAA2 20B804BA MOVE.L $4BA,(A0)
01FAA6 00000000 ORI.B #0,D0
01FAAA 00000000 ORI.B #0,D0
NOTE: Section of reserved space omitted
to reduce size of listing.
01FE9E 00000000 ORI.B #0,D0
01FEA2 00000000 ORI.B #0,D0
01FEA6 51CFFBFE DBRA D7,$1FAA6
01FEAA 203804BA MOVE.L $4BA,D0
01FEAE 90BA0038 SUB.L $1FEE8(PC),D0
01FEB2 4E4A TRAP #$A
01FEB4 207A003C MOVEA.L $1FEF2(PC),A0
01FEB8 6114 BSR.S $1FECE
01FEBA 323A0030 MOVE.W $1FEEC(PC),D1
01FEBE 4E44 TRAP #4
01FEC0 610C BSR.S $1FECE
01FEC2 41FA0016 LEA $1FEDA(PC),A0
01FEC6 6106 BSR.S $1FECE
01FEC8 2E7A001A MOVEA.L $1FEE4(PC),A7
01FECC 4E73 RTE
01FECE 4850 PEA (A0)
01FED0 3F3C0009 MOVE.W #9,-(A7)
01FED4 4E41 TRAP #1
01FED6 5C8F ADDQ.L #6,A7
01FED8 4E75 RTS
01FEDA 2020 MOVE.L -(A0),D0
01FEDC 6279 BHI.S $1FF57
01FEDE 7465 MOVEQ #$65,D2
01FEE0 730D DC.W $730D
01FEE2 0A00FF0A EORI.B #$A,D0
01FEE6 00000000 ORI.B #0,D0
01FEEA 00000000 ORI.B #0,D0
01FEEE 00000000 ORI.B #0,D0
01FEF2 00000000 ORI.B #0,D0
Analysis of TRAP_9P
The analysis of program 34 begins at the label
algorithm_start in the source listing, at address 01FAA6 in
the disassembly listing. The label reserved_space in the
source listing is superfluous, serving only to indicate that
the 1024 bytes of defined space is sandwiched between the
two labels algorithm_start and algorithm_end, which locate
the boundaries of the algorithm under test after it has been
installed. Note that neither the data nor the bss assembler
pseudo-ops are used before the ds.b 1024 pseudo-op. If the
data and bss pseudo-ops are used in conjunction with the
ds.b pseudo-op to define the reserved space, AssemPro will
locate the 1024 bytes of defined space in the program's data
section; therefore, the space reserved would not be between
the labels that mark the boundaries of the AUT. That
relocation of the reserved space is not necessarily
prejudicial to the structure of an adaptive algorithm if it
is considered during the design of the algorithm; but it is
important to know the precise effect of every instruction
and pseudo-op when dealing with self-modifying algorithms.
Moving the DBRA Instruction
Refer to the dbra instruction at address 01FEA6 in the
disassembly listing. The disassembler displays the entire
instruction in a format that is pleasing to the eye, DBRA
D7, $1FAA6, because it enables us to readily discern the
value that will be loaded into the program counter when the
branch is taken. But this convenience tends to obscure the
fact that the instruction exists in memory as 51CFFBFE,
which is not as pretty, but which more accurately portrays
the activity involved in assembling and executing the dbra
instruction. To wit: the assembler calculates a signed
displacement between the dbra instruction and a label and
stores this displacement in the extension word which follows
the instruction word. In this particular case, the
displacement is $FBFE = -1026 (decimal).
The important thing to realize is that the displacement
is fixed during assembly and will not change when the dbra
instruction is copied into the reserved space, following the
insertion of the AUT. But the distance between address
01FAA6 and the dbra instruction will be less then than it is
now; therefore, something must be done to alter the value in
the dbra extension word after the instruction has been
copied into the reserved space.
Within the source listing, following the two
instructions after the move_loop label, there is a note
concerning the movement of the dbra instruction into the
reserved space. It is the necessity for the displacement
correction which requires that the dbra move be accomplished
in a single transaction, apart from the movement of the
other code comprising the movable sections. The source
listing documentation which accompanies the instructions
following the note explains the steps involved in
calculating a new displacement value, but since some of
those instructions are specific to the movement of the code
following the dbra instruction, I will repeat the steps here
so that cognition may be confirmed.
1. Calculate the total number of bytes of code which
must be copied into the reserved space after the AUT
has been installed. This value is the number of
bytes existing between the label algorithm_end and
the end of the program. It is calculated in register
A1, then copied into register D3, which is used as a
loop counter when all but the dbra instruction
portion of the movable sections are copied into the
reserved space.
2. Because the dbra instruction will be moved apart
from the other instruction in the section, its
requisite memory (4 bytes) must be subtracted from
the counter, D3, before the bulk copy takes place.
3. Calculate the displacement between the dbra
instruction's current location and the location at
which it is to be copied. The dbra instruction's
current location is marked by the label
algorithm_end, which is still stored in register A0.
The value in A0 must be preserved for future
calculations, therefore, it is copied into A1. The
value in register A6 is the location to which the
dbra instruction must be moved. Subtracting A6 from
A1 puts the number of bytes that the dbra
instruction must be moved into A1. The extension
word correction to follow requires that the value in
A6 be preserved, so it is copied into register A5.
4. Fetch the current dbra displacement value. In the
source listing, you can see that this value is
stored in register D0.
5. Move (or copy) the dbra instruction into its new
location.
6. Correct the dbra instruction's displacement value.
The correction must be a positive value because the
distance between the branch instruction and the
branch location has decreased. Register A1 contains
the previously calculated correction, which is
simply the number of bytes that the dbra instruction
was moved. In the source listing, you can see that
the positive value in A1 is added to the negative
value in D0 to obtain the new displacement value.
Then the contents of D0 are written to the dbra
extension word's new location at 2(A5) with a self-
modifying instruction.
The effect of the above steps can be seen in figure
7.3, a disassembly listing of program 34 after a single
instruction AUT, MULU #5,D0, has been installed in the
reserved space, at address 01FAA6. The new dbra
displacement is shown in the instruction following; it is
$FFFA = -6 (decimal). Within the listing are several notes,
shown in boldface, that explain specific instruction groups.
Figure 7.3 Disassembly listing of the adaptive algorithm
portion of TRAP_9P.PRG as it resides in memory after the
initial invocation.
TRAP_9P.PRG After First Invocation
01FA36 99CB SUBA.L A3,A4
01FA38 45FA04B4 LEA $1FEEE(PC),A2
01FA3C 2488 MOVE.L A0,(A2)
01FA3E 45FA04B2 LEA $1FEF2(PC),A2
01FA42 2489 MOVE.L A1,(A2)
01FA44 41FA049E LEA $1FEE4(PC),A0
01FA48 208F MOVE.L A7,(A0)
01FA4A 4E6F MOVE.L USP,A7
01FA4C 4A15 TST.B (A5)
01FA4E 6706 BEQ.S $1FA56
01FA50 204D MOVEA.L A5,A0
01FA52 6100047A BSR $1FECE
01FA56 207A0496 MOVEA.L $1FEEE(PC),A0
01FA5A 61000472 BSR $1FECE
01FA5E 4DFA0046 LEA $1FAA6(PC),A6
01FA62 360C MOVE.W A4,D3
01FA64 41FA0486 LEA $1FEEC(PC),A0
01FA68 3083 MOVE.W D3,(A0)
01FA6A 5343 SUBQ.W #1,D3
01FA6C 1CDB MOVE.B (A3)+,(A6)+
01FA6E 51CBFFFC DBRA D3,$1FA6C
01FA72 41FA0432 LEA $1FEA6(PC),A0
01FA76 43FA047E LEA $1FEF6(PC),A1
01FA7A 93C8 SUBA.L A0,A1
01FA7C 3609 MOVE.W A1,D3
01FA7E 5943 SUBQ.W #4,D3
01FA80 2248 MOVEA.L A0,A1
01FA82 93CE SUBA.L A6,A1
01FA84 2A4E MOVEA.L A6,A5
01FA86 30280002 MOVE.W 2(A0),D0
01FA8A 2CD8 MOVE.L (A0)+,(A6)+
01FA8C D049 ADD.W A1,D0
01FA8E 3B400002 MOVE.W D0,2(A5)
01FA92 5343 SUBQ.W #1,D3
01FA94 1CD8 MOVE.B (A0)+,(A6)+
01FA96 51CBFFFC DBRA D3,$1FA94
01FA9A 5347 SUBQ.W #1,D7
01FA9C 41FA044A LEA $1FEE8(PC),A0
01FAA0 91C9 SUBA.L A1,A0
01FAA2 20B804BA MOVE.L $4BA,(A0)
01FAA6 C0FC0005 MULU #5,D0 ; AUT
01FAAA 51CFFFFA DBRA D7,$1FAA6 ; FFFA = -6
01FAAE 203804BA MOVE.L $4BA,D0
01FAB2 90BA0038 SUB.L $1FAEC(PC),D0
01FAB6 4E4A TRAP #$A
01FAB8 207A003C MOVEA.L $1FAF6(PC),A0
01FABC 6114 BSR.S $1FAD2
01FABE 323A0030 MOVE.W $1FAF0(PC),D1
01FAC2 4E44 TRAP #4
01FAC4 610C BSR.S $1FAD2
01FAC6 41FA0016 LEA $1FADE(PC),A0
01FACA 6106 BSR.S $1FAD2
01FACC 2E7A001A MOVEA.L $1FAE8(PC),A7
01FAD0 4E73 RTE
Below is the print_string subroutine.
01FAD2 4850 PEA (A0)
01FAD4 3F3C0009 MOVE.W #9,-(A7)
01FAD8 4E41 TRAP #1
01FADA 5C8F ADDQ.L #6,A7
01FADC 4E75 RTS
The data section appears below.
01FADE 2020 MOVE.L -(A0),D0 ;' '
01FAE0 6279 BHI.S $1FB5B ;by
01FAE2 7465 MOVEQ #$65,D2 ;te
01FAE4 730D DC.W $730D ;s
_SSP => $01FAE8
01FAE6 0A000003 EORI.B #3,D0
01FAEA EF3A ROL.B D7,D2
start_time (new address) = 01FAEC
01FAEC 00000000 ORI.B #0,D0
memory => 01FAF0
time_header => 01FAF2
01FAF0 00040006 ORI.B #6,D4
memory_header => 01FAF6
01FAF4 DBB90006DC1F ADD.L D5,$6DC1F
01FAFA 00000000 ORI.B #0,D0
01FAFE 00000000 ORI.B #0,D0
NOTE: Section of reserved space omitted
to reduce size of listing.
01FE9E 00000000 ORI.B #0,D0
01FEA2 00000000 ORI.B #0,D0
Below is the original movable block. It has not
been effected by the move.
01FEA6 51CFFBFE DBRA D7,$1FAA6
01FEAA 203804BA MOVE.L $4BA,D0
01FEAE 90BA0038 SUB.L $1FEE8(PC),D0
01FEB2 4E4A TRAP #$A
01FEB4 207A003C MOVEA.L $1FEF2(PC),A0
01FEB8 6114 BSR.S $1FECE
01FEBA 323A0030 MOVE.W $1FEEC(PC),D1
01FEBE 4E44 TRAP #4
01FEC0 610C BSR.S $1FECE
01FEC2 41FA0016 LEA $1FEDA(PC),A0
01FEC6 6106 BSR.S $1FECE
01FEC8 2E7A001A MOVEA.L $1FEE4(PC),A7
01FECC 4E73 RTE
01FECE 4850 PEA (A0)
01FED0 3F3C0009 MOVE.W #9,-(A7)
01FED4 4E41 TRAP #1
01FED6 5C8F ADDQ.L #6,A7
01FED8 4E75 RTS
01FEDA 2020 MOVE.L -(A0),D0
01FEDC 6279 BHI.S $1FF57
01FEDE 7465 MOVEQ #$65,D2
01FEE0 730D DC.W $730D
01FEE2 0A000003 EORI.B #3,D0
01FEE6 EF3A ROL.B D7,D2
start_time (old address) = 01FEE8
01FEE8 00000000 ORI.B #0,D0
01FEEC 00040006 ORI.B #6,D4
01FEF0 DBB90006DC1F ADD.L D5,$6DC1F
The Bulk Move
The rest of the code in the movable section can be
moved as a block. This transaction takes place at the
_move_loop label in the source listing, at address 01FA94 in
the disassembly listings. Note that immediately after the
block move, the address of the declared variable start_time
is loaded into register A0, and that, before the time is
fetched, the contents of register A1 are subtracted from
those of A0 to correct the address in register A0. These
transactions take place at addresses 01FA9C and 01FAA0 of
the disassembly listings.
Why The Address Of start_time Must BE Corrected
From time to time, we must remind ourselves that pc-
relative addressing can be used discriminatingly, by
including the (pc) notation in the source operand, or it can
be forced by the PC-relative assembly mode. Now, although
it may not be obvious from discussions in the references,
when pc-relative addressing is used in the source operand of
the LEA instruction, it, like the DBRA instruction, performs
its transaction with a displacement stored in its extension
word. This is evident from the machine code at address
01FA9C, where the extension word can be seen to be $044A =
1098 (decimal). To confirm that this is indeed the
displacement between the LEA instruction and the label
start_time we need only subtract their addresses:
$01FEE8 - $01FA9C = $44C
and from this subtract the size of the LEA instruction:
$44C - 2 bytes = $44A.
Now, to decide whether or not a correction to this
displacement is necessary, we need only determine whether or
not the displacement between the two addresses is altered by
the bulk move. Well, of course it does. The instruction
and the label will be closer after the move, and, since the
label will be moved the same distance as was the dbra
instruction, the same correction factor, which is still in
A1, must be applied. The displacement in the LEA
instruction's extension word must be corrected at this point
in the program because the location in which the start time
is stored will be accessed to compute the AUT's execution
time after the algorithm_start loop has been executed. You
can see that transaction taking place at address 01FAB2 of
figure 7.3. There you can also see that the new
displacement is $0038, and that start_time's new address is
01FAEC.
Program 35 is a source listing of the program designed
to confirm the rudimentary reliability of the adaptive
algorithm in program 34. The performance data for the AUTs
in program 35 should (and does) match that obtained with
program 32 because the algorithms were copied from that
program. The data generated by program 35 follows its
listing.
Program 35. This program is used to verify that program 34
functions correctly.
; Program Name: TRAP9TST.S
; Version: 1.001
; Assembly Instructions:
; Assemble in PC-relative mode and save with a TOS extension.
; Execution Instructions:
; Execute from the desktop; or execute SPAWN.TTP, type TRAP9TST.TOS on
; its command line and view this program's output in TRAP9TST.DAT.
; Program Function:
; Measures the speed of multiplication algorithms. Verifies that
; TRAP_9P functions correctly.
calculate_program_size:
lea -$102(pc), a1 ; Fetch basepage start address.
lea program_end, a0 ; Fetch program end = array address.
trap #6 ; Return unused memory to op system.
lea stack, a7
the_mulu_instruction:
lea header_1, a0
lea header_4, a1
lea mulu_algorithm_start, a3
lea mulu_algorithm_end, a4
lea heading, a5
move.w #50000, d7
invoke_trap_9:
trap #9
addition:
lea header_2, a0
lea header_5, a1
lea addition_algorithm_start, a3
lea addition_algorithm_end, a4
lea heading, a5
move.b #0, (a5) ; Store a NULL in first byte to create a
move.w #50000, d7 ; null string so that heading gets printed
trap #9 ; only once.
shift_and_add:
lea header_3, a0
lea header_6, a1
lea shift_and_add_algorithm_start, a3
lea shift_and_add_algorithm_end, a4
lea heading, a5
move.w #50000, d7
trap #9
terminate:
trap #8
mulu_algorithm_start:
mulu #5, d0 ; Instruction in the loop.
mulu_algorithm_end:
addition_algorithm_start:
move.l d0, d2 ; To add one.
add.l d0, d0 ; To double to two.
add.l d0, d0 ; To double to four.
add.l d2, d0 ; To complete multiplication by 5.
addition_algorithm_end:
shift_and_add_algorithm_start:
move.l d0, d2 ; Save a copy to add.
asl.l #2, d0 ; Shift to multiply by 4.
add.l d2, d0 ; To complete multiplication by 5.
shift_and_add_algorithm_end:
data
heading: dc.b "TRAP9TST Execution Results",$D,$A,$D,$A,0
header_1: dc.b " MULU time: ",0
header_2: dc.b " Addition time: ",0
header_3: dc.b " Shift and add time: ",0
header_4: dc.b " MULU requisite memory: ",0
header_5: dc.b " Addition requisite memory: ",0
header_6: dc.b " Shift and add requisite memory: ",0
bss
align
ds.l 96
stack: ds.l 0
program_end: ds.l 0
end
TRAP9TST Execution Results
MULU time: 360 milliseconds
MULU requisite memory: 4 bytes
Addition time: 260 milliseconds
Addition requisite memory: 8 bytes
Shift and add time: 235 milliseconds
Shift and add requisite memory: 6 bytes
A Relocatable Version of the Adaptive Algorithm
The adaptive algorithm may also be installed as a trap
that has been assembled in AssemPro's Relocatable mode.
According to definitions previously established, program 36
is probably more appropriately designated as a Combo version
of program 34, because pc-relative addressing is used as
circumstances permit. The choice of assembly modes for LSR
programs is a bit more engaging than it is for other types
of programs because the length of time that the program is
resident is so much longer than the load time. It may be
that one decides to accept the longer load time as a
condition of receiving some other benefit via Relocatable
assembly, or it may be that circumstances dictate the
assembly mode.
From the beginning of the book, my emphasis has been on
the quest for speed. To me it seems that all other
desirable software attributes precipitate naturally from the
quest for this single feature. Consider the reason we use
computers in the first place. Were speed not the most
compelling issue, computers would probably not have been
invented. That's why, in formulating my thoughts for the
current discussion, I anticipated that the relative speed of
the PC-relative assembled adaptive algorithm versus that of
the Relocatable assembled algorithm would be the most
obvious primary topic. But, as it turns out, my attention
was drawn to a more compelling consideration: that of the
AUT's addressing and assembly modes.
As I mentioned in program 34's documentation, at the
time that I designed that program, I could not predict the
requirements of all possible AUTs, but I did guess that a
Relocatable version of the adaptive algorithm might be
desirable, or required, for some invoking algorithms.
Simultaneously, I guessed that there would be times when one
or the other assembly modes might be preferable for the AUT.
It was while I was comparing the first few lines of the
disassembly listing for the Relocatable version of the
custom trap to that of the PC-relative version that I was
struck with the realization that the assembly mode chosen
for the AUT would at times be dictated by the instructions
comprising that algorithm. Then, if instructions in the AUT
were to store data in locations being referenced via a
displacement, that data could be written into the adaptive
algorithm's program area, thereby disruptively modifying the
adaptive algorithm. To illustrate this phenomenon, I have
chosen a suitable AUT, but first, let me present the source
file and disassembly listing for program 36.
Program 36. The adaptive algorithm designed for Relocatable
Assembly. Execution results follow the listing.
; Program Name: TRAP_9R.S
; Version 1.005
; Assembly Instructions:
; Assemble in Relocatable mode and save with a PRG extension. TRAP_9P.S
; is a version of this program that can be assembled in PC-relative mode.
; I have prepared two versions because I can't predict the precise requirements
; of all possible algorithms which might invoke the custom trap that is
; installed by this program.
; Program Function:
; This is a LSR program that establishes a user defined trap #9. It may
; be executed from the desktop, but you may prefer to copy it to the AUTO
; folder of your boot partition or floppy disk so that it will execute
; automatically during boot.
program_start: ; Calculate program size and retain result.
lea program_end(pc), a3 ; Fetch program end address.
suba.l 4(a7), a3 ; Subtract basepage address.
enter_supervisor_mode:
move.l #0, -(sp) ; The zero turns on supervisor mode.
move.w #$20, -(sp) ; Function = super = GEMDOS $20.
trap #1 ; Go to supervisor mode.
addq.l #6, sp ; Supervisor stack pointer (SSP) returned in D0.
movea.l d0, a5 ; Save SSP in scratch register.
install_trap_9_routine: ; Store trap address at pointer address.
move.l #trap_9_routine, $A4
enter_user_mode:
pea (a5) ; Restore supervisor stack pointer.
move.w #$20, -(sp) ; Function = super = GEMDOS $20.
trap #1
addq.l #6, sp
relinquish_processor_control: ; Maintain memory residency.
move.w #0, -(sp) ; Exit code.
move.l a3, -(sp) ; Program size.
move.w #$31, -(sp) ; Function = ptermres = GEMDOS $31.
trap #1
trap_9_routine:
; See documentation in program TRAP_9P.S.
calculate_length_of_AUT:
suba.l a3, a4 ; Subtract start address from end address.
save_heading_addresses:
move.l a0, time_header ; Header to precede reported execution time.
move.l a1, memory_header ; Header to precede reported requisite memory.
transfer_stack_pointer:
; See documentation in program TRAP_9P.S.
move.l a7, _SSP ; Save system stack pointer.
move USP, a7 ; Transfer to invoking program's stack.
print_heading:
; See documentation in program TRAP_9P.S.
tst.b (a5) ; NULL string test.
beq.s print_time_header ; Branch if null.
movea.l a5, a0 ; Print heading.
bsr print_string
print_time_header:
movea.l time_header(pc), a0 ; Fetch address of time header.
bsr print_string
build_algorithm:
lea reserved(pc), a6 ; Fetch address of reserved space.
move.w a4, d3 ; AUT length initializes counter for copy.
move.w d3, memory ; AUT length stored for report.
subq.w #1, d3 ; Initialize counter for loop execution.
move_loop: ; Move AUT, byte by byte, to the area
move.b (a3)+, (a6)+ ; declared as "reserved".
dbra d3, move_loop ; Branch until D3 = -1.
; See documentation in program TRAP_9P.S.
lea aut_end(pc), a0 ; Calculate number of bytes in this program
lea program_end(pc), a1 ; which must be moved into "reserved space",
suba.l a0, a1 ; then initialize counter D3 with the number
move.w a1, d3 ; of bytes to copy.
subq.w #4, d3 ; Subtract the DBRA requisite memory.
movea.l a0, a1 ; Calculate amount the DBRA instruction must
suba.l a6, a1 ; be moved and save the new location in scratch
movea.l a6, a5 ; register for displacement correction.
move.w 2(a0), d0 ; Fetch current DBRA displacement value.
move.l (a0)+, (a6)+ ; Move the DBRA instruction.
add.w a1, d0 ; Correct DBRA displacement to the value it
move.w d0, 2(a5) ; should be after the move.
subq.w #1, d3 ; Initialize counter for bulk copy.
_move_loop: ; Move rest of algorithm in a bulk copy.
move.b (a0)+, (a6)+
dbra d3, _move_loop
subq.w #1, d7 ; Initialize D7 for AUT loop execution.
lea start_time(pc), a0 ; Fetch address of declared variable.
; NOTE: No correction is necessary for the start time variable, as it was
; in the PC-relative assembled program TRAP_9P, if pc-relative
; addressing is not used for the label in the instruction below that
; subtracts the contents of start_time from d0. That instruction is
; marked by asterisks below. In the TRAP_9P program AssemPro used
; pc-relative addressing for that label, as in does for all labels
; when a program is assembled in PC-relative mode, that's why the
; correction for that label was necessary in that program.
; In the instruction above, pc-relative addressing can be used because
; that instruction is not relocated by the bulk move.
; Now, the address that is used in the instruction below is not the
; relocated address of the variable; the original location of the
; variable is used. When you look at the disassembly listing for this
; program, you will see that the extension longword (not word) contains
; the actual address of the variable, not a displacement.
; Of course, in the instruction above, the extension word does contain
; a displacement, but this displacement remains valid because the
; instruction is not moved.
; NOTE: Since processor is in supervisor mode, there is no reason to use
; trap #3 to fetch time. The 200hz vector can be referenced directly.
move.l $4BA, (a0) ; Fetch and store start time.
aut_start:
reserved: ds.b 1024 ; Space for loop construction = 1024 bytes.
aut_end:
dbra d7, aut_start
move.l $4BA, d0 ; Fetch end time.
;****
sub.l start_time, d0 ; Subtract start time from end time.
;****
trap #10 ; Convert to decimal milliseconds and print.
print_memory_header:
movea.l memory_header(pc), a0
bsr.s print_string
print_requisite_memory:
move.w memory(pc), d1 ; Fetch AUT's requisite memory.
trap #4 ; Returns address of decimal string in A0.
bsr.s print_string
lea header_1(pc), a0 ; Print requisite memory units label.
bsr.s print_string
; See documentation in program TRAP_9P.S.
movea.l _SSP, a7 ; Restore system stack pointer.
rte
;
; SUBROUTINES
;
print_string:
pea (a0)
move.w #9, -(sp)
trap #1
addq.l #6, sp
rts
data
header_1: dc.b ' bytes',$D,$A,0
bss
align
_SSP: ds.l 1 ; Address of the system stack will be stored here.
start_time: ds.l 1 ; Variable for time at start of AUT processing.
memory: ds.w 1 ; Variable for AUT's requisite memory.
time_header: ds.l 1 ; Points to address of header which is to precede
; the reported AUT processing time.
memory_header: ds.l 1 ; Points to address of header which is to precede
; the reported AUT requisite memory.
program_end: ds.l 0
end
TRAP9TST Execution Results With TRAP_9R.PRG Resident.
These results match those obtained with the PC-relative
assembled adaptive algorithm.
MULU time: 360 milliseconds
MULU requisite memory: 4 bytes
Addition time: 255 milliseconds
Addition requisite memory: 8 bytes
Shift and add time: 230 milliseconds
Shift and add requisite memory: 6 bytes
Program 34 is sufficiently similar to program 36 so
that most of the documentation for the first serves to
document the second; nevertheless, I still want to point out
a few important differences between the programs. Under the
label save_heading_addresses, near the beginning of the
trap_9_routine, note that the addresses are stored directly
to the labels in program 36 using absolute addressing, as
opposed to the indirect addressing method used in program
34. The absolute addressing mode is also used to store the
system stack pointer. But, as you can see, wherever
possible, pc-relative addressing is indicated in most of the
instructions that contain labels in the source operand.
That mode is indicated with the (pc) notation.
There is one particular instruction in which I
deliberately avoided pc-relative addressing so that I could
illustrate a specific advantage of using the absolute
addressing mode in the Relocatable assembled adaptive
algorithm. That is the instruction which subtracts the
start time from the end time:
sub.l start_time, d0.
The instruction has been made conspicuous by the ;**** group
being placed before and after it. Using absolute addressing
in this instruction obviates the inconvenience of a
correction to access the start_time variable, as was done in
program 34.
Figure 7.4. Disassembly listing of TRAP_9R.PRG as it resides
in memory after the initial invocation of the adaptive
algorithm.
TRAP_9R.PRG After First Invocation
01F9FE 47FA04FC LEA $1FEFC(PC),A3
01FA02 97EF0004 SUBA.L 4(A7),A3
01FA06 2F3C00000000 MOVE.L #0,-(A7)
01FA0C 3F3C0020 MOVE.W #$20,-(A7)
01FA10 4E41 TRAP #1
01FA12 5C8F ADDQ.L #6,A7
01FA14 2A40 MOVEA.L D0,A5
Note that absolute addressing is used in both operands
of the instruction that stores the adaptive algorithm's
address in the trap #9 vector.
01FA16 23FC0001FA360000 MOVE.L #$1FA36,$A4
00A4
01FA20 4855 PEA (A5)
01FA22 3F3C0020 MOVE.W #$20,-(A7)
01FA26 4E41 TRAP #1
01FA28 5C8F ADDQ.L #6,A7
01FA2A 3F3C0000 MOVE.W #0,-(A7)
01FA2E 2F0B MOVE.L A3,-(A7)
01FA30 3F3C0031 MOVE.W #$31,-(A7)
01FA34 4E41 TRAP #1
trap_9 routine starts here
01FA36 99CB SUBA.L A3,A4
01FA38 23C80001FEF4 MOVE.L A0,$1FEF4
01FA3E 23C90001FEF8 MOVE.L A1,$1FEF8
01FA44 23CF0001FEEA MOVE.L A7,$1FEEA
01FA4A 4E6F MOVE.L USP,A7
01FA4C 4A15 TST.B (A5)
01FA4E 6706 BEQ.S $1FA56
01FA50 204D MOVEA.L A5,A0
01FA52 61000480 BSR $1FED4
01FA56 207A049C MOVEA.L $1FEF4(PC),A0
01FA5A 61000478 BSR $1FED4
01FA5E 4DFA0046 LEA $1FAA6(PC),A6
01FA62 360C MOVE.W A4,D3
01FA64 33C30001FEF2 MOVE.W D3,$1FEF2
01FA6A 5343 SUBQ.W #1,D3
01FA6C 1CDB MOVE.B (A3)+,(A6)+
01FA6E 51CBFFFC DBRA D3,$1FA6C
01FA72 41FA0432 LEA $1FEA6(PC),A0
01FA76 43FA0484 LEA $1FEFC(PC),A1
01FA7A 93C8 SUBA.L A0,A1
01FA7C 3609 MOVE.W A1,D3
01FA7E 5943 SUBQ.W #4,D3
01FA80 2248 MOVEA.L A0,A1
01FA82 93CE SUBA.L A6,A1
01FA84 2A4E MOVEA.L A6,A5
01FA86 30280002 MOVE.W 2(A0),D0
01FA8A 2CD8 MOVE.L (A0)+,(A6)+
01FA8C D049 ADD.W A1,D0
01FA8E 3B400002 MOVE.W D0,2(A5)
01FA92 5343 SUBQ.W #1,D3
01FA94 1CD8 MOVE.B (A0)+,(A6)+
01FA96 51CBFFFC DBRA D3,$1FA94
01FA9A 5347 SUBQ.W #1,D7
01FA9C 41FA0450 LEA $1FEEE(PC),A0
01FAA0 20B9000004BA MOVE.L $4BA,(A0)
01FAA6 C0FC0005 MULU #5,D0
01FAAA 51CFFFFA DBRA D7,$1FAA6
01FAAE 2039000004BA MOVE.L $4BA,D0
01FAB4 90B90001FEEE SUB.L $1FEEE,D0
01FABA 4E4A TRAP #$A
01FABC 207A003E MOVEA.L $1FAFC(PC),A0
01FAC0 6116 BSR.S $1FAD8
01FAC2 323A0032 MOVE.W $1FAF6(PC),D1
01FAC6 4E44 TRAP #4
01FAC8 610E BSR.S $1FAD8
01FACA 41FA0018 LEA $1FAE4(PC),A0
01FACE 6108 BSR.S $1FAD8
01FAD0 2E790001FEEA MOVEA.L $1FEEA,A7
01FAD6 4E73 RTE
print_string subroutine
01FAD8 4850 PEA (A0)
01FADA 3F3C0009 MOVE.W #9,-(A7)
01FADE 4E41 TRAP #1
01FAE0 5C8F ADDQ.L #6,A7
01FAE2 4E75 RTS
01FAE4 2020 MOVE.L -(A0),D0
01FAE6 6279 BHI.S $1FB61
01FAE8 7465 MOVEQ #$65,D2
01FAEA 730D DC.W $730D
01FAEC 0A000003 EORI.B #3,D0
01FAF0 EF40 ASL.W #7,D0
01FAF2 00000000 ORI.B #0,D0
01FAF6 00040006 ORI.B #6,D4
01FAFA DBBF DC.W $DBBF
01FAFC 0006DC25 ORI.B #$25,D6
01FB00 00000000 ORI.B #0,D0
01FB04 00000000 ORI.B #0,D0
01FE9C 00000000 ORI.B #0,D0
01FEA0 00000000 ORI.B #0,D0
The movable section. Starting address is $01FEA6
01FEA4 000051CF ORI.B #-$31,D0
01FEA8 FBFE DC.W $FBFE
01FEAA 2039000004BA MOVE.L $4BA,D0
01FEB0 90B90001FEEE SUB.L $1FEEE,D0 ; sub.l start_time, d0
01FEB6 4E4A TRAP #$A
01FEB8 207A003E MOVEA.L $1FEF8(PC),A0
01FEBC 6116 BSR.S $1FED4
01FEBE 323A0032 MOVE.W $1FEF2(PC),D1
01FEC2 4E44 TRAP #4
01FEC4 610E BSR.S $1FED4
01FEC6 41FA0018 LEA $1FEE0(PC),A0
01FECA 6108 BSR.S $1FED4
01FECC 2E790001FEEA MOVEA.L $1FEEA,A7
01FED2 4E73 RTE
01FED4 4850 PEA (A0)
01FED6 3F3C0009 MOVE.W #9,-(A7)
01FEDA 4E41 TRAP #1
01FEDC 5C8F ADDQ.L #6,A7
01FEDE 4E75 RTS
01FEE0 2020 MOVE.L -(A0),D0
01FEE2 6279 BHI.S $1FF5D
01FEE4 7465 MOVEQ #$65,D2
01FEE6 730D DC.W $730D
01FEE8 0A000003 EORI.B #3,D0
01FEEC EF40 ASL.W #7,D0
01FEEE 00000000 ORI.B #0,D0
01FEF2 00040006 ORI.B #6,D4
01FEF6 DBBF DC.W $DBBF
01FEF8 0006DC25 ORI.B #$25,D6
I don't know if I've mentioned it before, but even if I
have, it is worth pointing out again that the addresses and
the instructions contained therin are not displayed in
disassembly listings as neat as one might like. If you
refer to the disassembly listing just above this paragraph
you can see that the address for the variable _SSP, 01FEEA,
is not displayed. To locate the contents of that address,
it is necessary to locate the nearest address that is
displayed. In this case, that is address 01FEE8. There you
see that the byte stored in that address is the ASCII code
for a linefeed. Then, displayed on the same line, at
address 01FEE9 is the 00 byte inserted by the align pseudo-
op. Finally, also displayed on the same line, at address
01FEEA, is the first byte of the value stored in _SSP. That
byte is 00. The next byte of _SSP, at address 01FEEB, is 03
and is displayed on that same line. To locate the other two
bytes stored in _SSP one must look at the next line, which
is address 01FEEC.
The value stored in the variable start_time is neatly
displayed on the next line, at address 01FEEE, but on the
line following, at address 01FEF2, the first word displayed
is the contents of the variable memory, while the second
word shown on that line is the first word of the value
stored in time_header (address = 01FEF4). The second word
of time-header is displayed on the line labeled 01FEF6. The
last address, 01FEF8, is the variable memory_header. After
a while, you get used to all of this, but I wanted to take
the opportunity to mention something that is trivial once
you know the "secret", but which can be a source of
consternation to one who is not accoustomed to reading the
listings.
To the extent that the major difference between the PC-
relative version and the Relocatable version of the custom
trap is the addressing mode used in particular instructions,
it seems reasonable to question the advantages, or
disadvantages, of each addressing mode. One advantage of
absolute addressing has already been discussed: it relieves
one of the address correction burden in some instances.
Naturally, there is the 16 bit displacement limitation
attached to pc-relative addressing, but I find this to be
rather insignificant for two reasons. I don't write many
assembly language programs in which instructions and labels
are 32,767 bytes apart, and when I do, I can find ways to
circumvent that limitation.
In maintaining the theme of the book, however, it seems
prudent to compare the execution speed and requisite memory
for instructions using the two addressing modes when that
16-bit limitation of pc-relative addressing is neglected.
Program 37 was designed to yield the appropriate comparative
data for this discussion, and, in addition, to serve as a
vehicle to illustrate possible consequences of the
disruptive modification phenomenon previously mentioned.
Program 37. A program that compares the execution speed and
requisite memory of two algorithms that write data to a
location.
; Program Name: LEA_MOVE.S
; Version: 1.003
; Assembly Instructions:
; Assemble in Relocatabe mode and save with a TOS extension.
; Program Function:
; Compares the relative speed and memory requirements of
; lea label(pc), Am
; move.l An, (Am)
; and lea label, Am
; move.l An, (Am)
; to move.l An, label
; The execution time reported is for 50,000 executions of each algorithm.
; In addition, this program's first AUT can disrupt the adaptive algorithm
; because the address that is loaded in register A2 depends on the displacement
; between the instruction "lea label(pc), a2" and the label it references.
; When that instruction is executed in the adaptive algorithm, whatever
; address is the "displacement" distance away from the instruction will be
; loaded into register A2. Then the instruction following, "move.l a0, (a2)
; will write whatever is in register A0 into that memory location.
; If the address stored in register A2 while the AUT is being executed by
; the adaptive algorithm happens to be within the adaptive algorithm's program
; or data space, the results could be unpleasant.
; In fact, an undisciplined store of this type is capable of corrupting any
; instruction or data that happens to be separated from the LEA instruction by
; an amount equal to the displacement, and the results could be catastrophic.
calculate_program_size:
lea -$102(pc), a1 ; Fetch basepage start address.
lea program_end, a0 ; Fetch program end = array address.
trap #6 ; Return unused memory to op system.
lea stack, a7
initialize_registers_1:
lea header_1, a0
lea header_2, a1
lea lea_move_start, a3
lea lea_move_end, a4
lea heading, a5
move.w #50000, d7
trap #9
initialize_registers_2:
lea header_3, a0
lea header_4, a1
lea long_lea_start, a3
lea long_lea_end, a4
lea heading, a5
move.b #0, (a5) ; Store a NULL in first byte to create a
move.w #50000, d7 ; null string so that heading gets printed
trap #9 ; only once.
initialize_registers_3:
lea header_5, a0
lea header_6, a1
lea move_start, a3
lea move_end, a4
lea heading, a5
move.w #50000, d7
trap #9
terminate:
trap #8
lea_move_start: ; This algorithm is potentially nocuous to
lea label(pc), a2 ; the adaptive algorithm because a displacement
move.l a0, (a2) ; will be stored in A2, not an address.
lea_move_end:
long_lea_start: ; This algorithm poses no threat because an
lea label, a2 ; address in this program's data area will be
move.l a0, (a2) ; stored in A2.
long_lea_end:
move_start: ; This algorithm poses no threat because an
move.l a0, label ; address in this program's data area will be
move_end: ; stored in A2.
data
heading: dc.b "LEA_MOVE Program Results",$D,$A,$D,$A,0
header_1: dc.b " Elapsed time for lea label(pc), Am ",$D,$A
dc.b " move.l An, (Am): ",0
header_2: dc.b " Memory required for first algorithm: ",0
header_3: dc.b $D,$A," Elapsed time for lea label, Am ",$D,$A
dc.b " move.l An, (Am): ",0
header_4: dc.b " Memory required for second algorithm: ",0
header_5: dc.b $D,$A," Elapsed time for move.l An, label: ",0
header_6: dc.b " Memory required for third algorithm: ",0
bss
align
label: ds.l 1
ds.l 96
stack: ds.l 0
program_end: ds.l 0
end
Before proceeding with the discussion initiated above,
I want to show you the alterations that I made to the
TRAPS.S program so that the adaptive algorithm would be
installed automatically during boot. First, I renamed the
trap installation program from TRAPS.S to CUSTOM.S, then I
inserted the TRAP_9P.S program into that source file before
assembling the entire conglomeration as CUSTOM.PRG. I don't
want to repeat the entire CUSTOM.S source file, therefore, I
have listed only the relevant portions in figure 7.5.
Figure 7.5. Alterations to TRAPS.S to convert it to
CUSTOM.S. The listing below is the contents of a file
called CUSTOM.DOC. The listing shows only the changes that
must be made to TRAPS.S to convert it to CUSTOM.S. Of
course, the alterations to TRAPS.S can be made by copying
the contents of CUSTOM.DOC to that file using a suitable
editor, such as TEMPUS or 1ST WORD PLUS.
; Program Name: CUSTOM.S
; Version 1.006
; Algorithms for trap #9, the intelligent algorithm, and trap #10 are included,
; along with the other custom traps from the TRAPS.S program.
; Assembly Instructions:
; Assemble in PC-relative mode and save with a PRG extension.
install_trap_9_routine: ; Note: pointer = vector = pointer.
lea trap_9_routine, a0 ; Fetch address of trap #9 routine.
move.l a0, $A4 ; Store custom trap address in pointer.
trap_9_routine:
; NOTE: You may want to execute a program that invokes this trap from the
; AssemPro debugger and single step through the trap instructions, or
; you many want to set breakpoints and use the "Run program" button to
; execute to a breakpoint. It the exercise of this latter option that
; can cause problems. If you want to set a breakpoint at an instruction
; that occurs before the "reserved space", no precautions need be taken,
; except that you must realize that you could halt at a position after
; the start time has been initialized, and the execution time will be
; accumulating during the halt.
; But, if you want to set a breakpoint at at an instruction that occurs
; after the "reserved space", you should not do so until the instructions
; that perform the bulk move have been executed. If you set a breakpoint
; at an instruction in the program space which follows the "reserved
; space" and then execute the bulk move instructions, the breakpoint you
; have set will cause an illegal command error.
; This custom trap must be invoked by a program which contains one or more
; algorithms for which performance data, in the form of execution time and
; requisite memory, is desired. The trap must be invoked once for each
; algorithm under test (AUT). The trap is intended to be a performance
; report generator so that performance data for specific algorithms can be
; compared.
; If the program invoking the trap is spawned by SPAWN.TTP, the performance
; data will be printed to a file that is created by SPAWN.TTP. The name of
; the file will be the name of the spawned program with a DAT suffix.
; If the program invoking the trap is executed from the desktop, the
; performance data will be printed to the screen.
; The AUT can be assembled in PC-relative or Relocatable mode.
; The custom trap expects the following parameters to be contained in the
; specified registers prior to invocation:
; A0 = The address of a null terminated header to precede the reported
; AUT execution time.
; A1 = The address of a null terminated header to precede the reported
; AUT requisite memory.
; A3 = The starting address of an algorith containing one or more instructions.
; This algorithm becomes the AUT upon invocation of the trap.
; A4 = The ending address of the AUT.
; A5 = At the initial invocation by a specific program, A5 is expected to
; contain the address of a null terminated heading that is a brief
; description of the data that is to be reported by the trap. During
; all subsequent invocations by that program, A5 must contain the
; address of a null string.
; D7 = A word length decimal value which specifies the number of times the
; the AUT is to be executed in a loop in order to generate the execution
; time data. Register D7 is the only register which cannot be used by
; the AUT. That's because D7 is used as the AUT execution loop counter.
; The custom trap uses the information passed in A3, A4 and D7 to construct
; a loop around the AUT. The loop is executed the number of times specified
; in register D7. The time required to execute the loop is calculated and
; printed along with the AUT's requisite memory.
; How Trap #9 Works:
; The custom trap is an intelligent, self-modifying algorithm. Prior
; to an initial invocation, the trap #9 routine is composed of three sections.
; The first section contains instructions which are permanently located. The
; second section is a 1,024 byte block of reserved space into which the AUT
; and the third section is to be copied. The third section contains the custom
; trap instructions that must be copied so that the first instruction in the
; third section immediately follows the last instruction of the algorithm
; under test. The size that I chose for the reserved space was arbitrary.
; After an invocation, the custom trap calculates the number of bytes in
; the algorithm under test. Then it initializes register D7 for the AUT's
; execution loop. After saving the addresses of the headers that were passed
; as parameters so that the registers in which they were passed can be used
; by trap #9 and/or the algorithm under test, the trap prints the AUT's
; primary heading and the execution time header. Then it copies the AUT into
; the reserved space.
; After the AUT has been copied into the reserved space, the instructions
; and data in the third section of the custom trap are moved into immediate
; proximity of the AUT so that the first instruction of section three follows
; immediately after the last instruction of the algorithm under test.
; The AUT is executed the number of times requested via the value passed
; in D7, then the execution time and requisite memory of the AUT are printed.
; After the custom trap has been invoked once, the reserved space will
; not be completely clear during the remaining portion of the power-up cycle.
; When the trap is subsequently invoked, the instructions that are already in
; the reserved space will simply be overwritten by the instructions of the
; new algorithm under test and those of the third section.
calculate_length_of_AUT:
suba.l a3, a4 ; Subtract start address from end address.
save_heading_addresses:
lea time_header, a2 ; Header to precede reported execution time.
move.l a0, (a2)
lea memory_header, a2 ; Header to precede reported requisite memory.
move.l a1, (a2)
transfer_stack_pointer:
; Upon entrance into this custom trap, A7 contains the address of the system
; stack pointer. Because the system stack may be too short for use by the AUT,
; the invoking program must provide a user stack, and, here, I am going to
; store the address of that stack in A7. Before returning to the invoking
; program, the system stack pointer must be restored so that the return address
; can be found by the processor.
lea _SSP, a0 ; Fetch address of declared variable.
move.l a7, (a0) ; Save system stack pointer.
move USP, a7 ; Transfer to invoking program's stack.
print_heading:
; NOTE: When two or more algorithms are to invoke the trap from a program,
; the heading should be printed only once, at the beginning of the
; report; therefore, the string address in A5 should point to a
; null terminated string during the first invocation of the trap, but
; it should point to a null string on subsequent invocations from the
; same program.
; For that reason, the test for NULL string is included below, so that
; only one attempt to print the heading will be made.
tst.b (a5) ; NULL string test.
beq.s print_time_header ; Branch if null.
movea.l a5, a0 ; Print heading.
bsr print_string_9
print_time_header:
movea.l time_header, a0 ; Fetch address of time header.
bsr print_string_9
build_algorithm:
lea reserved_space, a6 ; Fetch address of reserved space.
move.w a4, d3 ; AUT length initializes counter for copy.
lea memory, a0 ; Store AUT's requisite memory.
move.w d3, (a0) ; AUT length stored for report.
subq.w #1, d3 ; Initialize counter for loop execution.
move_loop: ; Move AUT, byte by byte, to the area
move.b (a3)+, (a6)+ ; declared as "reserved_space".
dbra d3, move_loop ; Branch until D3 = -1.
; A6 is now pointing to the location at which the AUT execution loop's DBRA
; instruction must be copied. The DBRA copy must be processed apart from the
; other instructions in the custom trap which must be copied to "close up" the
; gap between the last statement of the algorithm under test and the rest
; of the custom trap's instructions.
lea algorithm_end, a0 ; Calculate number of bytes in this program
lea trap_9_end, a1 ; which must be copied into "reserved_space",
suba.l a0, a1 ; then initialize counter D3 with the number
move.w a1, d3 ; of bytes to copy.
subq.w #4, d3 ; Subtract the DBRA requisite memory.
movea.l a0, a1 ; Calculate amount the DBRA instruction must
suba.l a6, a1 ; be moved and save the new location in scratch
movea.l a6, a5 ; register for displacement correction.
move.w 2(a0), d0 ; Fetch current DBRA displacement value.
move.l (a0)+, (a6)+ ; Move the DBRA instruction.
add.w a1, d0 ; Correct DBRA displacement to the value it
move.w d0, 2(a5) ; should be after the move.
subq.w #1, d3 ; Initialize counter for bulk copy.
_move_loop: ; Move rest of algorithm in a bulk copy.
move.b (a0)+, (a6)+
dbra d3, _move_loop
subq.w #1, d7 ; Initialize D7 for AUT loop execution.
lea start_time, a0 ; Fetch address of declared variable.
suba.l a1, a0 ; Correct address to after-move location.
; NOTE: Since processor is in supervisor mode, there is no reason to use
; trap #3 to fetch time. The 200hz vector can be referenced directly.
move.l $4BA, (a0) ; Fetch and store start time.
algorithm_start:
reserved_space: ds.b 1024 ; Space for loop construction = 1024 bytes.
algorithm_end:
dbra d7, algorithm_start
move.l $4BA, d0 ; Fetch end time.
sub.l start_time, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
print_memory_header:
movea.l memory_header, a0 ; Fetch address of memory header.
bsr print_string_9
print_requisite_memory:
move.w memory, d1 ; Fetch AUT's requisite memory.
trap #4 ; Returns address of decimal string in A0.
bsr print_string_9
lea header_1, a0 ; Print requisite memory units label.
bsr print_string_9
; NOTE: I had thought that I might have to restore the user stack pointer
; before returning to the calling program because I can't know to what
; extent the AUT has corrupted the user stack.
; But, as it turns out, since the value in USP is stored in register
; A7 at the beginning of the program, and since the system stack pointer
; is then active and remains so for the duration of the trap invocation,
; only the value in register A7 (which is the system stack pointer, but
; which points to the user stack) is altered; the value in USP remains
; uncorrupted, therefore, nothing needs be restored but the system stack
; pointer (so that it again points to the system stack).
; On subsequent invocations the value in USP, which remains stable, will
; simply be restored in A7, as often as the trap is invoked.
movea.l _SSP, a7 ; Restore system stack pointer so that it
rte ; points to the system stack.
trap_9_data_and_subroutine:
; NOTE: Trap #9 must have its own print_string subroutine because the
; displacements in the bsr print_string instructions in trap #9 are
; not altered by the move of this section into the reserved space.
; Therefore, the displacement between the instructions and the
; print_string subroutine must remained constant. The only way to
; maintain the displacement (without corrections) is to move the
; subroutine along with the rest of this section.
; Then what happens is this: the bsr print_string instructions located
; above the reserved space cause a branch to the pre-move location of the
; subroutine; those instructions located below the reserved space cause
; a branch to the after-move location of the subroutine. If you say all
; of this to yourself very quickly, it is less confusing. I think.
print_string_9: ; Expects address of string to be in A0.
pea (a0) ; Push address of string onto stack.
move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
trap #1 ; GEMDOS call
addq.l #6, sp ; Reset stack pointer to top of stack.
rts
header_1: dc.b ' bytes',$D,$A,0
align
_SSP: ds.l 1 ; Address of the system stack will be stored here.
start_time: ds.l 1 ; Variable for time at start of AUT processing.
memory: ds.w 1 ; Variable for AUT's requisite memory.
time_header: ds.l 1 ; Points to address of header which is to precede
; the reported AUT processing time.
memory_header: ds.l 1 ; Points to address of header which is to precede
; the reported AUT requisite memory.
trap_9_end: ds.l 1 ; Marks end of trap #9 section.
; This statement and the items in boldface below are not part of the trap_9
; routine. They are there to provide orientation so that the appropriate
; sections of this document may be copied into the new CUSTOM.S document.
trap_10_routine:
print_string: ; Expects address of string to be in A0.
pea (a0) ; Push address of string onto stack.
move.w #9, -(sp) ; Function = c_conws = GEMDOS $9.
trap #1 ; GEMDOS call
addq.l #6, sp ; Reset stack pointer to top of stack.
rts
data
subtrahend: dc.l $3B9ACA00,$5F5E100,$989680,$F4240,$186A0,$2710
dc.l $3E8,$64,$A,$1,$0
hex_table: dc.b '0123456789ABCDEF'
spawned: dc.b $0
space: dc.b " ",0
units_label: dc.b " milliseconds", $D,$A,0
bss
align ; Align storage on a word boundary.
after_load_time: ds.w 1 ; Spawned program's after load time.
decimal: ds.l 3 ; Output buffer. Must be NULL terminated.
hexadecimal: ds.l 3 ; Output buffer. Must be NULL terminated.
program_end: ds.l 0
end
The execution results of program 37 are shown in figure
7.6. The data illustrates that there is no speed nor
requisite memory difference between the first and third
algorithms. The addressing modes used in those algorithms
is pc-relative for the first and absolute for the third.
The first algorithm may seem to be hampered by the extra
instruction needed to indirectly store the data in An, but
it is only when pc-relative addressing cannot be used with
the LEA instruction that the extra instruction becomes a
handicap. This is evident from the data for the second
algorithm, where the execution time and requisite memory are
greater than that for the other two algorithms. But, in
general, the algorithms being considered will be those with
the appearance of algorithms one and two. Clearly, the
execution speed and requisite memory of those are identical.
Figure 7.6. Execution results of program 37 with the newly
created CUSTOM.PRG resident.
LEA_MOVE Program Results
Elapsed time for lea label(pc), Am
move.l An, (Am): 205 milliseconds
Memory required for first algorithm: 6 bytes
Elapsed time for lea label, Am
move.l An, (Am): 235 milliseconds
Memory required for second algorithm: 8 bytes
Elapsed time for move.l An, label: 205 milliseconds
Memory required for third algorithm: 6 bytes
A partial disassembly of the adaptive algorithm during
invocation, but before AUT loop execution, by each of
program 37's three AUT's is shown in figure 7.7. Although
the addresses here are different than those shown in
previous disassembly listings, because the custom trap was
installed by CUSTOM.PRG during boot, it is easy to see that
the displacement in the extension word of the first AUT's
LEA instruction causes an address to be loaded into A2 that
is not located in the invoking program, but is located
within the custom trap's space. No problem occurs because
it just so happens that the address $17D54 falls within the
unused reserved space. But the folly of depending on this
kind of luck should be apparent. This problem does not
occur with the second and third AUTs because relative
addressing is not used in those instructions; therefore the
address loaded into A2 during invocation by the second and
third AUTs is located within the invoking program's address
space.
Figure 7.7. A partial disassembly of the adaptive algorithm
during invocation by program 37's three AUTs.
LEA_MOVE's 1st AUT in CUSTOM's adaptive
017BCA 45FA0188 LEA $17D54(PC),A2
017BCE 2488 MOVE.L A0,(A2)
017BD0 51CFFFF8 DBRA D7,$17BCA
LEA_MOVE's 2nd AUT in CUSTOM's adaptive
017BCA 45F90006D730 LEA $6D730,A2
017BD0 2488 MOVE.L A0,(A2)
017BD2 51CFFFF6 DBRA D7,$17BCA
LEA_MOVE's 3rd AUT in CUSTOM's adaptive
017BCA 23C80006D730 MOVE.L A0,$6D730
017BD0 51CFFFF8 DBRA D7,$17BCA
I must not leave you with the impression that the
problem with the address being loaded in the first AUT has
much to do with the fact that it is not located within the
boundaries of the invoking program. No, it is because the
address it does load is within the boundaries of the
adaptive algorithm and that address is going to be corrupted
by the second statement of the first algorithm. The purpose
of the three AUT's is to compare the execution speed and
requisite memory of three algorithms which store data in an
address that is specified by a label. As far as the test is
concerned, the location of the label can be anyplace in
memory.
Of course, I would not leave you to suffer this
dilemma. I have prepared another version of program 37 and
another version of the adaptive algorithm to illustrate a
method of dealing with this problem of referencing labels in
an AUT's data area from within the execution loop located in
the adaptive algorithm. Program 38 is the program that will
exercise the new adaptive algorithm. AUT_DATA's execution
results are shown in figure 7.8.
Program 38. In this program, data areas are declared for
each AUT, and the variables label_1, label_2 and label_3 are
the labels for those data areas.
; Program Name: AUT_DATA.S
; Version: 1.001
; Assembly Instructions:
; Assemble in Relocatabe mode and save with a TOS extension.
; Program Function:
; Compares the relative speed and memory requirements of
; lea label(pc), Am
; move.l An, (Am)
; and lea label, Am
; move.l An, (Am)
; to move.l An, label
; The execution time reported is for 50,000 executions of each algorithm.
; In this program, a data area is declared before the AUT's so that the
; first AUT can't disrupt the adaptive algorithm. The address of the data
; area is passed to the adaptive algorithm in register A2.
; A branch statement is included in each AUT so that the adaptive algorithm
; will jump over the data area.
; Of course, this calls for a redesign of the adaptive algorithm.
calculate_program_size:
lea -$102(pc), a1 ; Fetch basepage start address.
lea program_end, a0 ; Fetch program end = array address.
trap #6 ; Return unused memory to op system.
lea stack, a7
initialize_registers_1:
lea header_1, a0
lea header_2, a1
lea lea_data_area, a2
lea lea_move_start, a3
lea lea_move_end, a4
lea heading, a5
move.w #50000, d7
trap #9
initialize_registers_2:
lea header_3, a0
lea header_4, a1
lea long_data_area, a2
lea long_lea_start, a3
lea long_lea_end, a4
lea heading, a5
move.b #0, (a5) ; Store a NULL in first byte to create a
move.w #50000, d7 ; null string so that heading gets printed
trap #9 ; only once.
initialize_registers_3:
lea header_5, a0
lea header_6, a1
lea move_data_area, a2
lea move_start, a3
lea move_end, a4
lea heading, a5
move.w #50000, d7
trap #9
terminate:
trap #8
lea_data_area:
bra.s lea_move_start
label_1: ds.l 1
lea_move_start:
lea label_1(pc), a2
move.l a0, (a2)
lea_move_end:
long_data_area:
bra.s long_lea_start
label_2: ds.l 1
long_lea_start:
lea label_2, a2
move.l a0, (a2)
long_lea_end:
move_data_area:
bra.s move_start
label_3: ds.l 1
move_start:
move.l a0, label_3
move_end:
data
heading: dc.b "AUT_DATA Program Results",$D,$A,$D,$A,0
header_1: dc.b " Elapsed time for lea label(pc), Am ",$D,$A
dc.b " move.l An, (Am): ",0
header_2: dc.b " Memory required for first algorithm: ",0
header_3: dc.b $D,$A," Elapsed time for lea label, Am ",$D,$A
dc.b " move.l An, (Am): ",0
header_4: dc.b " Memory required for second algorithm: ",0
header_5: dc.b $D,$A," Elapsed time for move.l An, label: ",0
header_6: dc.b " Memory required for third algorithm: ",0
bss
align
ds.l 96
stack: ds.l 0
program_end: ds.l 0
end
Program 39. A new version of the program that installs the
adaptive algorithm. This version is designed to process
AUT's in which an area of memory has been set aside to be
used as a data area. Note that the pseudo-op data is not
used.
; Program Name: TRAP9DAT.S
; Version 1.002
; Assembly Instructions:
; Assemble in PC-relative mode and save with a PRG extension.
; Program Function:
; This is a LSR program that establishes a user defined trap #9. It may
; be executed from the desktop, but you may prefer to copy it to the AUTO
; folder of your boot partition or floppy disk so that it will execute
; automatically during boot.
; This is a special, single-purpose trap that is used to illustrate a
; method of preventing corruption of the adaptive algorithm.
program_start: ; Calculate program size and retain result.
lea program_end, a3 ; Fetch program end address.
suba.l 4(a7), a3 ; Subtract basepage address.
enter_supervisor_mode:
move.l #0, -(sp) ; The zero turns on supervisor mode.
move.w #$20, -(sp) ; Function = super = GEMDOS $20.
trap #1 ; Supervisor stack pointer (SSP) returned in D0.
addq.l #6, sp
movea.l d0, a5 ; Save SSP in scratch register.
install_trap_9_routine: ; Note: pointer = vector = pointer.
lea trap_9_routine, a0 ; Fetch address of trap #9 routine.
move.l a0, $A4 ; Store trap address at pointer address.
enter_user_mode:
pea (a5) ; Restore supervisor stack pointer.
move.w #$20, -(sp) ; Function = super = GEMDOS $20.
trap #1
addq.l #6, sp
relinquish_processor_control: ; Maintain memory residency.
move.w #0, -(sp) ; Exit code.
move.l a3, -(sp) ; Program size.
move.w #$31, -(sp) ; Function = ptermres = GEMDOS $31.
trap #1
trap_9_routine:
; The custom trap expects the following parameters to be contained in the
; specified registers prior to invocation:
; A0 = The address of a null terminated header to precede the reported
; AUT execution time.
; A1 = The address of a null terminated header to precede the reported
; AUT requisite memory.
; A2 = The address of a data area for the algorithm to be tested. The ending
; address for the data area must be the address stored in register A3;
; that is, it must be the starting address for the algorithm.
; A3 = The starting address of an algorithm containing one or more instructions.
; This algorithm becomes the AUT upon invocation of the trap.
; A4 = The ending address of the AUT.
; A5 = At the initial invocation by a specific program, A5 is expected to
; contain the address of a null terminated heading that is a brief
; description of the data that is to be reported by the trap. During
; all subsequent invocations by that program, A5 must contain the
; address of a null string.
; D7 = A word length decimal value which specifies the number of times the
; the AUT is to be executed in a loop in order to generate the execution
; time data. Register D7 is the only register which cannot be used by
; the AUT. That's because D7 is used as the AUT execution loop counter.
calculate_length_of_AUT_plus_data_area:
suba.l a2, a4 ; Subtract data start address from AUT start
; address.
calculate_length_of_AUT_data_area:
suba.l a2, a3 ; Subtract AUT start address from AUT end
; address.
save_heading_addresses:
lea time_header, a6 ; Header to precede reported execution time.
move.l a0, (a6)
lea memory_header, a6 ; Header to precede reported requisite memory.
move.l a1, (a6)
lea heading, a6 ; Null terminated heading or NULL string.
move.l a5, (a6)
transfer_AUT_start_address: ; Because contents of A2 will be altered soon.
movea.l a2, a5
transfer_stack_pointer:
lea _SSP, a0 ; Fetch address of declared variable.
move.l a7, (a0) ; Save system stack pointer.
move USP, a7 ; Transfer to invoking program's stack.
print_heading:
movea.l heading, a0 ; Fetch heading address.
tst.b (a0) ; NULL string test.
beq.s print_time_header ; Branch if null.
bsr print_string
print_time_header:
movea.l time_header, a0 ; Fetch address of time header.
bsr print_string
move_AUT:
lea reserved_space, a6 ; Fetch address of reserved space.
move.w a4, d3 ; AUT length initializes counter for copy.
lea memory, a0 ; Store AUT's requisite memory.
move.w d3, d0 ; Copy total AUT length to D0 for correction.
sub.w a3, d0 ; Subtract length of data area.
move.w d0, (a0) ; AUT length stored for report.
subq.w #1, d3 ; Initialize counter for loop execution.
move_loop: ; Move data area and AUT, byte by byte, to the
move.b (a5)+, (a6)+ ; area declared as "reserved_space".
dbra d3, move_loop ; Branch until D3 = -1.
move_adaptive_algorithm:
lea algorithm_end, a0 ; Calculate number of bytes in this program
lea program_end, a1 ; which must be copied into "reserved_space",
suba.l a0, a1 ; then initialize counter D3 with the number
move.w a1, d3 ; of bytes to copy.
subq.w #4, d3 ; Subtract the DBRA requisite memory.
movea.l a0, a1 ; Calculate amount the DBRA instruction must
suba.l a6, a1 ; be moved and save the new location in scratch
movea.l a6, a5 ; register for displacement correction.
move.w 2(a0), d0 ; Fetch current DBRA displacement value.
move.l (a0)+, (a6)+ ; Move the DBRA instruction.
; In this version of the adaptive algorithm, when the DBRA displacement is
; corrected, the size of the AUT's data space must be added to the
; correction so that the branch will go to the start of the AUT's executable
; statements, not to the start of the data space.
add.w a3, d0 ; Add length of AUT's data space.
add.w a1, d0 ; Correct DBRA displacement to the value it
move.w d0, 2(a5) ; should be after the move.
subq.w #1, d3 ; Initialize counter for bulk copy.
_move_loop: ; Move rest of algorithm in a bulk copy.
move.b (a0)+, (a6)+
dbra d3, _move_loop
subq.w #1, d7 ; Initialize D7 for AUT loop execution.
lea start_time, a0 ; Fetch address of declared variable.
suba.l a1, a0 ; Correct address to after-move location.
; NOTE: Since processor is in supervisor mode, there is no reason to use
; trap #3 to fetch time. The 200hz vector can be referenced directly.
move.l $4BA, (a0) ; Fetch and store start time.
algorithm_start:
reserved_space: ds.b 1024 ; Space for loop construction = 1024 bytes.
algorithm_end:
dbra d7, algorithm_start
move.l $4BA, d0 ; Fetch end time.
sub.l start_time, d0 ; Subtract start time from end time.
trap #10 ; Convert to decimal milliseconds and print.
print_memory_header:
movea.l memory_header, a0 ; Fetch address of memory header.
bsr.s print_string
print_requisite_memory:
move.w memory, d1 ; Fetch AUT's requisite memory.
trap #4 ; Returns address of decimal string in A0.
bsr.s print_string
lea header_1, a0 ; Print requisite memory units label.
bsr.s print_string
movea.l _SSP, a7 ; Restore system stack pointer so that it
rte ; points to the system stack.
;
; SUBROUTINES
;
print_string:
pea (a0)
move.w #9, -(sp)
trap #1
addq.l #6, sp
rts
data
header_1: dc.b ' bytes',$D,$A,0
bss
align
_SSP: ds.l 1 ; Address of the system stack will be stored here.
start_time: ds.l 1 ; Variable for time at start of AUT processing.
memory: ds.w 1 ; Variable for AUT's requisite memory.
time_header: ds.l 1 ; Points to address of header which is to precede
; the reported AUT processing time.
memory_header: ds.l 1 ; Points to address of header which is to precede
; the reported AUT requisite memory.
heading: ds.l 1 ; Null terminated heading or NULL string.
program_end: ds.l 0
end
Figure 7.8. AUT_DATA's execution results.
AUT_DATA Program Results
Elapsed time for lea label(pc), Am
move.l An, (Am): 205 milliseconds
Memory required for first algorithm: 6 bytes
Elapsed time for lea label, Am
move.l An, (Am): 230 milliseconds
Memory required for second algorithm: 8 bytes
Elapsed time for move.l An, label: 205 milliseconds
Memory required for third algorithm: 6 bytes
Figure 7.9. Disassembly listing. The first section is a
complete listing of the adaptive algorithm during invocation
by the first AUT. Only partial listings are shown for the
invocations by the second and third AUTs. All listings were
prepared before execution of the adaptive algorithm's
performance loop.
AUT_DATA's 1st AUT in TRAP9DAT
01FA36 99CA SUBA.L A2,A4
01FA38 97CA SUBA.L A2,A3
01FA3A 4DFA04C4 LEA $1FF00(PC),A6
01FA3E 2C88 MOVE.L A0,(A6)
01FA40 4DFA04C2 LEA $1FF04(PC),A6
01FA44 2C89 MOVE.L A1,(A6)
01FA46 4DFA04C0 LEA $1FF08(PC),A6
01FA4A 2C8D MOVE.L A5,(A6)
01FA4C 2A4A MOVEA.L A2,A5
01FA4E 41FA04A6 LEA $1FEF6(PC),A0
01FA52 208F MOVE.L A7,(A0)
01FA54 4E6F MOVE.L USP,A7
01FA56 207A04B0 MOVEA.L $1FF08(PC),A0
01FA5A 4A10 TST.B (A0)
01FA5C 6704 BEQ.S $1FA62
01FA5E 61000480 BSR $1FEE0
01FA62 207A049C MOVEA.L $1FF00(PC),A0
01FA66 61000478 BSR $1FEE0
01FA6A 4DFA004C LEA $1FAB8(PC),A6
01FA6E 360C MOVE.W A4,D3
01FA70 41FA048C LEA $1FEFE(PC),A0
01FA74 3003 MOVE.W D3,D0
01FA76 904B SUB.W A3,D0
01FA78 3080 MOVE.W D0,(A0)
01FA7A 5343 SUBQ.W #1,D3
01FA7C 1CDD MOVE.B (A5)+,(A6)+
01FA7E 51CBFFFC DBRA D3,$1FA7C
01FA82 41FA0434 LEA $1FEB8(PC),A0
01FA86 43FA0484 LEA $1FF0C(PC),A1
01FA8A 93C8 SUBA.L A0,A1
01FA8C 3609 MOVE.W A1,D3
01FA8E 5943 SUBQ.W #4,D3
01FA90 2248 MOVEA.L A0,A1
01FA92 93CE SUBA.L A6,A1
01FA94 2A4E MOVEA.L A6,A5
01FA96 30280002 MOVE.W 2(A0),D0
01FA9A 2CD8 MOVE.L (A0)+,(A6)+
01FA9C D04B ADD.W A3,D0
01FA9E D049 ADD.W A1,D0
01FAA0 3B400002 MOVE.W D0,2(A5)
01FAA4 5343 SUBQ.W #1,D3
01FAA6 1CD8 MOVE.B (A0)+,(A6)+
01FAA8 51CBFFFC DBRA D3,$1FAA6
01FAAC 5347 SUBQ.W #1,D7
01FAAE 41FA044A LEA $1FEFA(PC),A0
01FAB2 91C9 SUBA.L A1,A0
01FAB4 20B804BA MOVE.L $4BA,(A0)
01FAB8 6004 BRA.S $1FABE
01FABA 00000000 ORI.B #0,D0
01FABE 45FAFFFA LEA $1FABA(PC),A2
01FAC2 2488 MOVE.L A0,(A2)
01FAC4 51CFFFF8 DBRA D7,$1FABE
01FAC8 203804BA MOVE.L $4BA,D0
01FACC 90BA0038 SUB.L $1FB06(PC),D0
01FAD0 4E4A TRAP #$A
01FAD2 207A003C MOVEA.L $1FB10(PC),A0
01FAD6 6114 BSR.S $1FAEC
01FAD8 323A0030 MOVE.W $1FB0A(PC),D1
01FADC 4E44 TRAP #4
01FADE 610C BSR.S $1FAEC
01FAE0 41FA0016 LEA $1FAF8(PC),A0
01FAE4 6106 BSR.S $1FAEC
01FAE6 2E7A001A MOVEA.L $1FB02(PC),A7
01FAEA 4E73 RTE
01FAEC 4850 PEA (A0)
01FAEE 3F3C0009 MOVE.W #9,-(A7)
01FAF2 4E41 TRAP #1
01FAF4 5C8F ADDQ.L #6,A7
01FAF6 4E75 RTS
01FAF8 2020 MOVE.L -(A0),D0
01FAFA 6279 BHI.S $1FB75
01FAFC 7465 MOVEQ #$65,D2
01FAFE 730D DC.W $730D
01FB00 0A000004 EORI.B #4,D0
01FB04 0EB000000000 BCLR #0,0(A0,D0.W)
01FB0A 00060006 ORI.B #6,D6
01FB0E FB75 DC.W $FB75
01FB10 0006FBC9 ORI.B #-$37,D6
01FB14 0006FB58 ORI.B #$58,D6
01FB18 040A DC.W $40A
01FB1A 00000000 ORI.B #0,D0
01FB1E 00000000 ORI.B #0,D0
01FEAE 00000000 ORI.B #0,D0
01FEB2 00000000 ORI.B #0,D0
01FEB6 000051CF ORI.B #-$31,D0
01FEBA FBFE DC.W $FBFE
01FEBC 203804BA MOVE.L $4BA,D0
01FEC0 90BA0038 SUB.L $1FEFA(PC),D0
01FEC4 4E4A TRAP #$A
01FEC6 207A003C MOVEA.L $1FF04(PC),A0
01FECA 6114 BSR.S $1FEE0
01FECC 323A0030 MOVE.W $1FEFE(PC),D1
01FED0 4E44 TRAP #4
01FED2 610C BSR.S $1FEE0
01FED4 41FA0016 LEA $1FEEC(PC),A0
01FED8 6106 BSR.S $1FEE0
01FEDA 2E7A001A MOVEA.L $1FEF6(PC),A7
01FEDE 4E73 RTE
01FEE0 4850 PEA (A0)
01FEE2 3F3C0009 MOVE.W #9,-(A7)
01FEE6 4E41 TRAP #1
01FEE8 5C8F ADDQ.L #6,A7
01FEEA 4E75 RTS
01FEEC 2020 MOVE.L -(A0),D0
01FEEE 6279 BHI.S $1FF69
01FEF0 7465 MOVEQ #$65,D2
01FEF2 730D DC.W $730D
01FEF4 0A000004 EORI.B #4,D0
01FEF8 0EB000000000 BCLR #0,0(A0,D0.W)
01FEFE 00060006 ORI.B #6,D6
01FF02 FB75 DC.W $FB75
01FF04 0006FBC9 ORI.B #-$37,D6
01FF08 0006FB58 ORI.B #$58,D6
AUT_DATA's 2nd AUT in TRAP9DAT
01FAAE 41FA044A LEA $1FEFA(PC),A0
01FAB2 91C9 SUBA.L A1,A0
01FAB4 20B804BA MOVE.L $4BA,(A0)
01FAB8 6004 BRA.S $1FABE
01FABA 00000000 ORI.B #0,D0
01FABE 45F90006FB40 LEA $6FB40,A2
01FAC4 2488 MOVE.L A0,(A2)
01FAC6 51CFFFF6 DBRA D7,$1FABE
01FACA 203804BA MOVE.L $4BA,D0
AUT_DATA's 3rd AUT in TRAP9DAT
01FAAE 41FA044A LEA $1FEFA(PC),A0
01FAB2 91C9 SUBA.L A1,A0
01FAB4 20B804BA MOVE.L $4BA,(A0)
01FAB8 6004 BRA.S $1FABE
01FABA 00000000 ORI.B #0,D0
01FABE 23C80006FB4E MOVE.L A0,$6FB4E
01FAC4 51CFFFF8 DBRA D7,$1FABE
01FAC8 203804BA MOVE.L $4BA,D0
In figure 7.9, you can see that only in the performance
loop for the first AUT does the AUT reference an address
that is located within the adaptive algorithm's space. The
instructions in the other two AUT's reference an address
that is located within the invoking program's space.
Knowing how the declared data space will be referenced, you
can choose the addressing mode that forces the reference to
be performed in the manner most suited to specific
applications. The most important difference between this
adaptive algorithm and the previous is that this one permits
instructions to write to a data area without disrupting the
adaptive algorithm.
Wrapping It Up
I am going to wrap up this chapter with two more
comparative programs. Both programs were executed after the
custom traps were installed with CUSTOM.PRG. Program 40 was
designed to confirm or refute the statement on page 113 of
the Stan Kelly-Bootie book which asserts that MOVEA.Z
#<label>, A0 is equivalent to LEA <label>, A0, and faster.
Program 40 should be assembled in PC-relative mode, saved
with a TOS extension; then the name of the file should be
changed to LEAMOVER in the assempro editor, and that file
should be assembled in Relocatable mode. Figure 7.10 shows
the results of LEAMOVEA.TOS; figure 7.11 shows those of
LEAMOVER.TOS.
Program 40. A program that compares the execution speed and
requisite memory of two instructions that load an address
into an address register.
; Program Name: LEAMOVEA.S
; Version: 1.001
; Assembly Instructions:
; Assemble with ASSEMPRO and save with a TOS extension.
; Execution Instructions:
; Execute from the desktop; or execute SPAWN.TTP, type LEAMOVEA.TOS on
; its command line and view this program's output in LEAMOVEA.DAT.
; Program Function:
; Confirms (or refutes) the statement on page 113 of the Stan Kelly-
; Bootle book, to wit: MOVEA.Z #<label>, A0 is equivalent to LEA, and
; faster.
; For each of the two methods of loading the address of a string into
; and address register, calculates the memory occupied by each method, and
; calculates the execution time in milliseconds required for 50,000
; executions of each.
; PROGRAM RESULTS:
; When the program containing the two instructions under discussion
; is assembled in AssemPro's Relocatable mode, the instructions are
; equivalent in speed and requisite memory.
; However, when the program is assembled in AssemPro's PC-relative
; mode, the LEA instruction is faster and requires less memory. The claim
; that the MOVEA.Z instruction is faster than the LEA instruction is,
; therefore, soundly refuted.
; But, when the program is assembled in PC-relative mode, the MOVEA.Z
; instruction does not move the correct value into the address register,
; so there is really no choice when either pc-relative addressing or
; PC-relative assembly is used--LEA must be used then.
calculate_program_size:
lea -$102(pc), a1 ; Fetch basepage start address.
lea program_end, a0 ; Fetch program end = array address.
trap #6 ; Return unused memory to op system.
lea stack, a7
initialize_registers_1:
lea header_1, a0
lea header_2, a1
lea lea_start, a3
lea lea_end, a4
lea heading, a5
move.w #50000, d7
trap #9
initialize_registers_2:
lea header_3, a0
lea header_4, a1
lea movea_start, a3
lea movea_end, a4
lea heading, a5
move.b #0, (a5) ; Store a NULL in first byte to create a
move.w #50000, d7 ; null string so that heading gets printed
trap #9 ; only once.
terminate:
trap #8
lea_start:
lea newline, a2 ; Instruction in the loop.
lea_end:
movea_start:
movea.l #newline, a2 ; Instruction in the loop.
movea_end:
data
heading: dc.b 'LEAMOVEA Execution Results',$D,$A,$D,$A,0
header_1: dc.b ' Elapsed time for LEA: ',0
header_2: dc.b ' Memory required for LEA: ',0
header_3: dc.b $D,$A,' Elapsed time for MOVEA: ',0
header_4: dc.b ' Memory required for MOVEA: ',0
newline: dc.b $D,$A,0
bss
align
ds.l 96
stack: ds.l 0
program_end: ds.l 0
end
Figure 7.10. Execution results for LEAMOVEA.TOS.
LEAMOVEA Execution Results
Elapsed time for LEA: 130 milliseconds
Memory required for LEA: 4 bytes
Elapsed time for MOVEA: 155 milliseconds
Memory required for MOVEA: 6 bytes
Figure 7.11. Execution results for LEAMOVER.TOS.
LEAMOVER Execution Results
Elapsed time for LEA: 155 milliseconds
Memory required for LEA: 6 bytes
Elapsed time for MOVEA: 155 milliseconds
Memory required for MOVEA: 6 bytes
The data shown in figure 7.11 confirms that the
execution speed and requisite memory for the two
instructions are equivalent when the program in which they
reside is assembled in the Relocatable mode. However, the
data in figure 7.10 shows that when the program is assembled
in PC-relative mode, or when pc-relative addressing is used
in the source operand of the LEA instruction, that
instruction is faster and uses less memory.
But there is more to this comparison then is readily
apparent. Figure 7.12 is a partial disassembly of the
adaptive algorithm during invocation of the two PC-relative
AUT's. There you can see the futility and danger of trying
to use the movea instruction to load an address in a PC-
relative assembled program. The address that is actually
loaded is located beyond the space of both the invoking
program and the adaptive algorithm. Figure 7.13 shows that
either instruction can be used to load an address that is
located within the invoking algorithm's space.
Figure 7.12. Partial disassembly of the adaptive algorithm
during invocation by the two AUT's in LEAMOVEA.TOS.
LEAMOVEA.TOS partial disassembly in trap
LEAMOVEA.TOS was prepared by assembling LEAMOVEA.S in AssemPro's PC-relative
mode. The address being loaded into A2 by the LEA instruction is located
within the space of the adaptive algorithm.
017BCA 45FA009F LEA $17C6B(PC),A2
017BCE 51CFFFFA DBRA D7,$17BCA
The address being loaded into A2 by the MOVEA instruction decimal 233 and is
located beyond the space of the adaptive algorithm and that of the invoking
program.
017BCA 247C000000E9 MOVEA.L #$E9,A2
017BD0 51CFFFF8 DBRA D7,$17BCA
Figure 7.13. Partial disassembly of the adaptive algorithm
during invocation by the two AUT's in LEAMOVER.TOS.
LEAMOVER.TOS partial disassembly in trap
LEAMOVER.TOS was prepared by assembling LEAMOVEA.S in AssemPro's Relocatable
mode. The address being loaded into A2 by the LEA instruction is located
within the space of the invoking algorithm.
017BCA 45F90006D6AF LEA $6D6AF,A2
017BD0 51CFFFF8 DBRA D7,$17BCA
The address being loaded into A2 by the MOVEA instruction is located within
the space of the invoking algorithm.
017BCA 247C0006D6AF MOVEA.L #$6D6AF,A2
017BD0 51CFFFF8 DBRA D7,$17BCA
Based on the data produced and shown in figures 7.10,
7.11, 7.12 and 7.13, I conclude that Mr. Kelly-Bootle's
claim is soundly refuted, and furthermore, when the program
in which the instructions reside is assembled in PC-relative
mode, the MOVEA instruction does not move the correct value
into the address register, so there is really no choice when
either pc-relative addressing or PC-relative assembly is
used. In either of those two cases, the LEA instruction
must be used.
Just One More, I Promise
Program 41 compares pea (a0) to move.l a0, -(sp). I am
including this program for two reasons. The first reason is
to confirm that they are equivalent; the second is to give
you an example of the kind of information that is contained
in the Thomas P. Skinner book--the kind of information that
prompted me to recommend it. The results of program 41's
execution follow the listing. Personally, I think that the
PEA instruction is the more elegant.
Program 41. Compares two ways of pushing the contents
of an address register onto the stack.
; Program Name: PEA_MOVE.S
; Version: 1.001
; Assembly Instructions:
; Assemble in PC-relative mode and save with a TOS extension.
; Program Function:
; Compares the relative speed and memory requirements of "pea (a0)" to
; that of "move.l a0, -(sp)". The execution time reported is for
; 50,000 executions of each instruction.
; Execution Note:
; Invokes traps that are installed by CUSTOM.PRG during boot.
calculate_program_size:
lea -$102(pc), a1 ; Fetch basepage start address.
lea program_end, a0 ; Fetch program end = array address.
adda.l #200512, a0 ; Add in extra large stack space.
movea.l a0, a7 ; Point A7 to this program's stack.
trap #6 ; Return unused memory to op system.
initialize_registers_1:
lea header_1, a0
lea header_2, a1
lea pea_algorithm_start, a3
lea pea_algorithm_end, a4
lea heading, a5
move.w #50000, d7
trap #9
initialize_registers_2:
lea header_3, a0
lea header_4, a1
lea move_algorithm_start, a3
lea move_algorithm_end, a4
lea heading, a5
move.b #0, (a5) ; Store a NULL in first byte to create a
move.w #50000, d7 ; null string so that heading gets printed
trap #9 ; only once.
terminate:
trap #8
pea_algorithm_start:
pea (a0) ; Instruction in the loop.
pea_algorithm_end:
move_algorithm_start:
move.l a0, -(sp) ; Instruction in the loop.
move_algorithm_end:
data
heading: dc.b "PEA_MOVE Program Results",$D,$A,$D,$A,0
header_1: dc.b " Elapsed time for pea (a0): ",0
header_2: dc.b " Memory required for pea (a0): ",0
header_3: dc.b $D,$A," Elapsed time for move.l a0, -(sp): ",0
header_4: dc.b " Memory for move.l a0, -(sp) ",0
bss
align
program_end: ds.l 0
end
PEA_MOVE Program Results
Elapsed time for pea (a0): 155 milliseconds
Memory required for pea (a0): 2 bytes
Elapsed time for move.l a0, -(sp): 155 milliseconds
Memory for move.l a0, -(sp) 2 bytes
Conclusion
The thrust of this chapter was twofold. First, I
wanted to present an algorithm that would permit shorter
versions of programs designed to obtain performance or
comparative data. Second, in doing that, I wanted to
illustrate the inherent power of self-modifying algorithms.
There are many more things that I could say about such
algorithms, but I can't allow myself to be drawn into that
provocative subject--lest I abandon the rest of the book. I
leave you with this thought. Suppose that you learned to do
truly powerful things with your personal computer. With
whom would you share your secrets? Is it possible that you
might even discourage others from traveling the routes that
led to your own success?